定位过程
死机调用栈如下,推栈结果与 bt
结果大差不差。由一个终端程序一路调用到 vmalloc
,最后死在其内部。注意到函数 out_of_memory
,这表明可能是系统内存快耗尽了。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
[0]kdb> bt
Stack traceback for pid 693
0xffff880362416740 693 1 1 0 R 0xffff8803624171c0 *vtysh
[15792.666500] ffff88042d6bb818 0000000000000018 ffffffff818604c0 ffff88042d6bb858
[15792.755452] ffffffff8109f38e ffff88042d6bb848 ffffffff817a6588 000000000040adbd
[15792.844406] ffffffff818604c0 ffffffff818607b0 ffff880054d36fa8 ffff88042d6bb868
[15792.933360] Call Trace:
[15792.962596] [<ffffffff8109f38e>] ? notifier_call_chain+0x4e/0x80
[15793.035508] [<ffffffff8109f45a>] ? atomic_notifier_call_chain+0x1a/0x20
[15793.115702] [<ffffffff81147eb1>] ? panic+0x101/0x21e
[15793.176131] [<ffffffff8114ddce>] ? out_of_memory+0x46e/0x470
[15793.244884] [<ffffffff81153b96>] ? __alloc_pages_nodemask+0x9b6/0xb30
[15793.322995] [<ffffffff8119376c>] ? alloc_pages_current+0x8c/0x110
[15793.396945] [<ffffffff8118832f>] ? __vmalloc_node_range+0x16f/0x290
[15793.472977] [<ffffffff813af519>] ? n_tty_open+0x19/0xe0
[15793.536526] [<ffffffff811884f4>] ? vmalloc+0x54/0x60
[15793.596958] [<ffffffff813af519>] ? n_tty_open+0x19/0xe0
[15793.660508] [<ffffffff813af519>] ? n_tty_open+0x19/0xe0
[15793.724060] [<ffffffff813b38f2>] ? tty_ldisc_open.isra.2+0x32/0x60
[15793.799050] [<ffffffff813b41c0>] ? tty_ldisc_hangup+0x1d0/0x200
[15793.870922] [<ffffffff813ab592>] ? __tty_hangup+0x2b2/0x3e0
[15793.938634] [<ffffffff813abe29>] ? disassociate_ctty.part.25+0x49/0x2a0
[15794.018825] [<ffffffff813ac0f9>] ? disassociate_ctty+0x29/0x30
[15794.089657] [<ffffffff81083792>] ? do_exit+0x732/0xa80
[15794.152168] [<ffffffff8108c1cf>] ? recalc_sigpending+0x1f/0x60
[15794.222996] [<ffffffff81083b73>] ? do_group_exit+0x43/0xb0
[15794.289668] [<ffffffff8108f2c8>] ? get_signal+0x278/0x5e0
[15794.355300] [<ffffffff81005358>] ? do_signal+0x28/0x6d0
[15794.418853] [<ffffffff8100217d>] ? exit_to_usermode_loop+0x6d/0xa0
[15794.493841] [<ffffffff81002a88>] ? syscall_return_slowpath+0x48/0x60
[15794.570915] [<ffffffff8152dc6b>] ? int_ret_from_sys_call+0x25/0x8f
r15 = 0x0000000000000000 r14 = 0xffffffff8cff41c0
r13 = 0x0000000000000000 r12 = 0xffffffff8188ad00
bp = 0xffff88042d6bb818 bx = 0x00000000fffffffe
r11 = 0x000000000000ae76 r10 = 0x0000000000039940
r9 = 0xffffffff818bbd00 r8 = 0x0000000000000000
ax = 0x0000000000000001 cx = 0x00000000ffffffff
dx = 0xffffffff8cff41c0 si = 0x0000000000000000
di = 0xffffffff818bbd00 orig_ax = 0xffffffffffffffff
ip = 0xffffffff8130fad9 cs = 0x0000000000000010
flags = 0x0000000000000046 sp = 0xffff88042d6bb800
ss = 0x0000000000000018 ®s = 0xffff88042d6bb768
[0]kdb> 0xffffffff8130fad9
0xffffffff8130fad9 = 0xffffffff8130fad9 (kdb_panic+0x29)
[0]kdb> md 0xffff88042d6bb818 1
0xffff88042d6bb818 ffff88042d6bb858 ffffffff8109f38e X.k-............
[0]kdb> ffffffff8109f38e
ffffffff8109f38e = 0xffffffff8109f38e (notifier_call_chain+0x4e)
[0]kdb> md ffff88042d6bb858 1
0xffff88042d6bb858 ffff88042d6bb868 ffffffff8109f45a h.k-....Z.......
[0]kdb> ffffffff8109f45a
ffffffff8109f45a = 0xffffffff8109f45a (atomic_notifier_call_chain+0x1a)
[0]kdb> md ffff88042d6bb868 1
0xffff88042d6bb868 ffff88042d6bb8e8 ffffffff81147eb1 ..k-.....~......
[0]kdb> ffffffff81147eb1
ffffffff81147eb1 = 0xffffffff81147eb1 (panic+0x101)
[0]kdb> md ffff88042d6bb8e8 1
0xffff88042d6bb8e8 ffff88042d6bb948 ffffffff8114ddce H.k-............
[0]kdb> ffffffff8114ddce
ffffffff8114ddce = 0xffffffff8114ddce (out_of_memory+0x46e)
[0]kdb> md ffff88042d6bb948 1
0xffff88042d6bb948 ffff88042d6bba98 ffffffff81153b96 ..k-.....;......
[0]kdb> ffffffff81153b96
ffffffff81153b96 = 0xffffffff81153b96 (__alloc_pages_nodemask+0x9b6)
[0]kdb> md ffff88042d6bba98 1
0xffff88042d6bba98 ffff88042d6bbae8 ffffffff8119376c ..k-....l7......
[0]kdb> ffffffff8119376c
ffffffff8119376c = 0xffffffff8119376c (alloc_pages_current+0x8c)
[0]kdb> md ffff88042d6bbae8 1
0xffff88042d6bbae8 ffff88042d6bbb68 ffffffff8118832f h.k-..../.......
[0]kdb> ffffffff8118832f
ffffffff8118832f = 0xffffffff8118832f (__vmalloc_node_range+0x16f)
[0]kdb> md ffff88042d6bbb68 1
0xffff88042d6bbb68 ffff88042d6bbb98 ffffffff811884f4 ..k-............
[0]kdb> ffffffff811884f4
ffffffff811884f4 = 0xffffffff811884f4 (vmalloc+0x54)
[0]kdb> md ffff88042d6bbb98 1
0xffff88042d6bbb98 ffff88042d6bbbb8 ffffffff813af519 ..k-......:.....
[0]kdb> ffffffff813af519
ffffffff813af519 = 0xffffffff813af519 (n_tty_open+0x19)
[0]kdb> md ffff88042d6bbbb8 1
0xffff88042d6bbbb8 ffff88042d6bbbd8 ffffffff813b38f2 ..k-.....8;.....
[0]kdb> ffffffff813b38f2
ffffffff813b38f2 = 0xffffffff813b38f2 (tty_ldisc_open.isra.2+0x32)
[0]kdb> md ffff88042d6bbbd8 1
0xffff88042d6bbbd8 ffff88042d6bbc08 ffffffff813b41c0 ..k-.....A;.....
[0]kdb> ffffffff813b41c0
ffffffff813b41c0 = 0xffffffff813b41c0 (tty_ldisc_hangup+0x1d0)
[0]kdb> 0xffffffff813b41c0 1
0xffffffff813b41c0 = 0xffffffff813b41c0 (tty_ldisc_hangup+0x1d0)
[0]kdb> ffffffff813b41c0
ffffffff813b41c0 = 0xffffffff813b41c0 (tty_ldisc_hangup+0x1d0)
[0]kdb> md ffff88042d6bbc08 1
0xffff88042d6bbc08 ffff88042d6bbc68 ffffffff813ab592 h.k-......:.....
[0]kdb> ffffffff813ab592
ffffffff813ab592 = 0xffffffff813ab592 (__tty_hangup+0x2b2)
[0]kdb> md ffff88042d6bbc68 1
0xffff88042d6bbc68 ffff88042d6bbc98 ffffffff813abe29 ..k-....).:.....
[0]kdb> ffffffff813abe29
ffffffff813abe29 = 0xffffffff813abe29 (disassociate_ctty.part.25+0x49)
[0]kdb> md ffff88042d6bbc98 1
0xffff88042d6bbc98 ffff88042d6bbca8 ffffffff813ac0f9 ..k-......:.....
[0]kdb> ffffffff813ac0f9 1
ffffffff813ac0f9 = 0xffffffff813ac0f9 (disassociate_ctty+0x29)
[0]kdb> md ffff88042d6bbca8 1
0xffff88042d6bbca8 ffff88042d6bbd28 ffffffff81083792 (.k-.....7......
[0]kdb> ffffffff81083792
ffffffff81083792 = 0xffffffff81083792 (do_exit+0x732)
summary
显示 16G 的内存现在只剩下 144M 了。Slab
占用 12317492KB
,且不可回收的 Slab 内存 SUnreclaim
高达 12296600KB
。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
[0]kdb> summary
sysname Linux
release 4.19.90-20
version
machine x86_64
nodename ...
domainname (none)
ccversion gcc version 5.4.0 (...)
uptime 04:04
load avg 61.45 60.49 60.02
MemTotal: 16955124 kB
MemFree: 144116 kB
MemAvailable: 70124 kB
Buffers: 40 kB
Cached: 613968 kB
SwapCached: 0 kB
Active: 354580 kB
Inactive: 96308 kB
Active(anon): 354560 kB
Inactive(anon): 96284 kB
Active(file): 20 kB
Inactive(file): 24 kB
Unevictable: 517652 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 354656 kB
Mapped: 41356 kB
Shmem: 96312 kB
Slab: 12317492 kB
SReclaimable: 20892 kB
SUnreclaim: 12296600 kB
KernelStack: 3216 kB
PageTables: 86772 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 8477560 kB
Committed_AS: 2659584 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
AnonHugePages: 38912 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
部分 dmesg
信息如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[0]kdb> dmesg 200
<3>[14579.560239] Out of memory: Kill process 16281 (...) score 0 or sacrifice child
<3>[14579.560251] Killed process 16281 (...) total-vm:439208kB, anon-rss:2048kB, file-rss:1656kB
<2>[14579.562861] %%--.../SYSTEM/2/SYSLOG(l): Failed to allocate memory for [vtysh] process,please check system memory usage details.
<3>[14579.562866] Out of memory: Kill process 7712 (...) score 0 or sacrifice child
<3>[14579.562877] Killed process 7712 (...) total-vm:437196kB, anon-rss:2188kB, file-rss:1352kB
<2>[14579.563225] %%--.../SYSTEM/2/SYSLOG(l): Failed to allocate memory for [vtysh] process,please check system memory usage details.
<3>[14579.563228] Out of memory: Kill process 7731 (...) score 0 or sacrifice child
<3>[14579.563237] Killed process 7731 (...) total-vm:439208kB, anon-rss:2040kB, file-rss:1424kB
<2>[14579.564227] %%--.../SYSTEM/2/SYSLOG(l): Failed to allocate memory for [vtysh] process,please check system memory usage details.
<3>[14579.564234] Out of memory: Kill process 5351 (...) score 0 or sacrifice child
<3>[14579.564241] Killed process 5351 (...) total-vm:439780kB, anon-rss:2044kB, file-rss:1412kB
...
<4>[14579.571210] vtysh invoked oom-killer: gfp_mask=0x24002c2, order=0, oom_score_adj=0
...
<0>[14579.571720] Kernel panic - not syncing: Out of memory and no killable processes...
可以看到 oom-killer
被唤醒,系统已经开始杀进程了。
Kernel panic - not syncing: Out of memory and no killable processes...
表明 oom-killer
已经没有可杀的进程了,内核无法通过 oom-killer
释放足够内存,最终系统崩溃。
经过确认,设备的并发连接数较高,slab
的确是要占用很多内存,基本可以排除是内存泄漏,所以定位结论就是内存不足导致死机,修改方案就是针对特定功能做内存限制。
一个疑问
为什么内存不足会导致死机?换言之,为什么死机会死在 vmalloc
内部?如果内存不足,vmalloc
不应该返回 NULL
吗?我原先以为应该是 vmalloc
内存分配失败返回 NULL
,结果上层调用者没有对返回值进行判断,直接使用了 NULL
所指的异常内存,最终导致空指针错误,但事实并非如此。难道 vmalloc
不健壮?可是这怎么可能呢?为此,我询问了 DeepSeek,虽然不知道它的回答是否正确,但它成功说服了我,我觉得应该是对的。
原因分析:
vmalloc
在内存不足时确实会返回NULL
,但前提是它能正常执行到分配失败的处理路径。- 当系统处于严重内存耗尽状态时,
vmalloc
自身的执行过程可能需要分配内存(如创建页表、管理数据结构),而这些内部操作也可能因内存不足而失败。当触发 OOM killer 但释放内存失败时,内核会直接调用panic()
终止系统,而不是返回到vmalloc
。
结论:vmalloc
内部调用的页分配器触发 OOM killer,但 OOM killer 无法释放足够内存,最终内核主动 panic()
。
📌 关键启示:当内存耗尽到一定程度时,内核自身的基本操作(如分配管理数据结构)都可能失败,此时系统会直接崩溃,而不是优雅地返回错误。这类似于“氧气面罩失效时,飞行员无法操作降落系统”的级联失效场景。
DeepSeek 的说法确实有道理,此前我并不知道 vmalloc
内部本身也有内存分配的操作,这样就说得通了:vmalloc
内部的执行也需要内存,若是连这点内存都不够了,就只能绝望地主动触发死机了。
定位结论
高并发环境下,设备维持了大量会话,占用了大量内存,以致内存接近耗尽。此时,一个与终端相关的进程发起了系统调用,期间申请了内存。然而,由于内存消耗较为极端,以致一些基本的内存管理操作都难以为继,内核 vmalloc
函数为了继续执行下去就唤起了 OOM killer
杀进程,但即便杀死了一些进程,内存也依然不够用,vmalloc
函数还没来得及返回 NULL
就不得不主动触发了死机。
尽管上面的理解不一定正确,但该结论应该是八九不离十吧,应该,嗯。