Reputation: 1
I'm using debian jessie with kernel 3.16.39-1:
# apt-cache policy linux-image-3.16.0-4-amd64
linux-image-3.16.0-4-amd64:
Installed: 3.16.39-1
Candidate: 3.16.39-1
Version table:
*** 3.16.39-1 0
500 http://ftp.fr.debian.org/debian/ jessie/main amd64 Packages
100 /var/lib/dpkg/status
This machine is using 2 bonding interface:
irqbalance is running on this machine.
Under network load (12Gb/s on bond1) i got the following kernel panic:
kernel: [26339.017497] Call Trace:
kernel: [26339.017499] <IRQ> [<ffffffff81514c11>] ? dump_stack+0x5d/0x78
kernel: [26339.017509] [<ffffffff81144a3f>] ? warn_alloc_failed+0xdf/0x130
kernel: [26339.017513] [<ffffffff810a949d>] ? __wake_up_sync_key+0x3d/0x60
kernel: [26339.017515] [<ffffffff81148daf>] ? __alloc_pages_nodemask+0x8ef/0xb50
kernel: [26339.017519] [<ffffffff8147eaff>] ? tcp_v4_do_rcv+0x1af/0x4c0
kernel: [26339.017524] [<ffffffff81455b66>] ? nf_hook_slow+0x76/0x130
kernel: [26339.017528] [<ffffffff811883ad>] ? alloc_pages_current+0x9d/0x150
kernel: [26339.017531] [<ffffffff81412d7b>] ? __netdev_alloc_frag+0x8b/0x140
kernel: [26339.017534] [<ffffffff8141913f>] ? __netdev_alloc_skb+0x6f/0xf0
kernel: [26339.017558] [<ffffffffa0146a0d>] ? ixgbe_clean_rx_irq+0x10d/0xb70 [ixgbe]
kernel: [26339.017564] [<ffffffffa0148198>] ? ixgbe_poll+0x488/0x860 [ixgbe]
kernel: [26339.017567] [<ffffffff8108c9ad>] ? hrtimer_get_next_event+0xad/0xc0
kernel: [26339.017570] [<ffffffff81425509>] ? net_rx_action+0x129/0x250
kernel: [26339.017573] [<ffffffff8106d911>] ? __do_softirq+0xf1/0x2d0
kernel: [26339.017575] [<ffffffff8106dd25>] ? irq_exit+0x95/0xa0
kernel: [26339.017578] [<ffffffff8151dbe2>] ? do_IRQ+0x52/0xe0
kernel: [26339.017582] [<ffffffff8151ba2d>] ? common_interrupt+0x6d/0x6d
kernel: [26339.017583] <EOI> [<ffffffff8108c31d>] ? __hrtimer_start_range_ns+0x1cd/0x3a0
kernel: [26339.017588] [<ffffffff813e32a2>] ? cpuidle_enter_state+0x52/0xc0
kernel: [26339.017590] [<ffffffff813e3298>] ? cpuidle_enter_state+0x48/0xc0
kernel: [26339.017592] [<ffffffff810a9b28>] ? cpu_startup_entry+0x328/0x470
kernel: [26339.017595] [<ffffffff81043fdf>] ? start_secondary+0x20f/0x2d0
[....]
kernel: [26339.017647] swapper/13: page allocation failure: order:0, mode:0x20
kernel: [26339.017667] active_anon:2860787 inactive_anon:290478 isolated_anon:15723
kernel: [26339.017667] active_file:284318 inactive_file:151176 isolated_file:0
kernel: [26339.017667] unevictable:20736 dirty:24804 writeback:4297 unstable:0
kernel: [26339.017667] free:23079 slab_reclaimable:27293 slab_unreclaimable:86672
kernel: [26339.017667] mapped:22343 shmem:413 pagetables:10111 bounce:0
kernel: [26339.017667] free_cma:0
kernel: [26339.017670] Node 0 DMA free:15896kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15980kB managed:15896kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
kernel: [26339.017675] lowmem_reserve[]: 0 3191 16016 16016
kernel: [26339.017680] Node 0 DMA32 free:56312kB min:13456kB low:16820kB high:20184kB active_anon:589468kB inactive_anon:141384kB active_file:1132312kB inactive_file:597576kB unevictable:16616kB isolated(anon):0kB isolated(file):0kB present:3345344kB managed:3270860kB mlocked:16616kB dirty:33860kB writeback:4288kB mapped:18616kB shmem:180kB slab_reclaimable:17036kB slab_unreclaimable:83696kB kernel_stack:34016kB pagetables:8384kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
kernel: [26339.017686] lowmem_reserve[]: 0 0 12824 12824
kernel: [26339.017691] Node 0 Normal free:20108kB min:54060kB low:67572kB high:81088kB active_anon:10853680kB inactive_anon:1020528kB active_file:4960kB inactive_file:7128kB unevictable:66328kB isolated(anon):62892kB isolated(file):0kB present:13369344kB managed:13131968kB mlocked:66328kB dirty:65356kB writeback:12900kB mapped:70756kB shmem:1472kB slab_reclaimable:92136kB slab_unreclaimable:262992kB kernel_stack:10880kB pagetables:32060kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:4275 all_unreclaimable? no
kernel: [26339.017696] lowmem_reserve[]: 0 0 0 0
kernel: [26339.017701] Node 0 DMA: 0*4kB 0000000000000020 ffff88042f1a3bf0
kernel: [26339.017706] 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (R) 3*4096kB (M) = 15896kB
kernel: [26339.017723] Node 0 DMA32: 250*4kB
kernel: [26339.017726] ffffffff81144a3f 0000000000000000 0000000000000000 ffffffff00000002
kernel: [26339.017730] (EM) 967*8kB (UEM) 2628*16kB (UM) 83*32kB (UMR) 15*64kB (R) 8*128kB (R) 4*256kB (R) 0*512kB 0*1024kB 0*2048kB <4>[26339.017747] swapper/0: page allocation failure: order:0, mode:0x20
kernel: [26339.017748] 0*4096kB = 56448kB
kernel: [26339.017751] Node 0 Normal: 3653*4kB (M) 0*8kB 0*16kB 1*32kB (R) 0*64kB 1*128kB (R) 0*256kB 1*512kB (R) 0*1024kB 1*2048kB (R) 0*4096kB = 17332kB
kernel: [26339.017767] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
kernel: [26339.017768] 466495 total pagecache pages
kernel: [26339.017769] 10046 pages in swap cache
kernel: [26339.017771] Swap cache stats: add 4415081, delete 4405035, find 1682225/2488531
kernel: [26339.017772] Free swap = 19301256kB
kernel: [26339.017773] Total swap = 19764220kB
kernel: [26339.017774] 4182667 pages RAM
kernel: [26339.017775] 0 pages HighMem/MovableOnly
kernel: [26339.017776] 59344 pages reserved
kernel: [26339.017777] 0 pages hwpoisoned
Kernel panic show messages related about irq and ixgbe.
Could someone give me some advice to solve this problem ? The server was running fine during 2 hours with same network load without any problems.
Regards,
Upvotes: 0
Views: 910
Reputation: 9
The call trace cannot show any debug information related to kernel panic.
kernel: [26339.017509] [<ffffffff81144a3f>] ? warn_alloc_failed+0xdf/0x130
kernel: [26339.017513] [<ffffffff810a949d>] ? __wake_up_sync_key+0x3d/0x60
kernel: [26339.017515] [<ffffffff81148daf>] ? __alloc_pages_nodemask+0x8ef/0xb50
kernel: [26339.017519] [<ffffffff8147eaff>] ? tcp_v4_do_rcv+0x1af/0x4c0
kernel: [26339.017524] [<ffffffff81455b66>] ? nf_hook_slow+0x76/0x130
kernel: [26339.017528] [<ffffffff811883ad>] ? alloc_pages_current+0x9d/0x150
kernel: [26339.017531] [<ffffffff81412d7b>] ? __netdev_alloc_frag+0x8b/0x140
kernel: [26339.017534] [<ffffffff8141913f>] ? __netdev_alloc_skb+0x6f/0xf0
kernel: [26339.017558] [<ffffffffa0146a0d>] ? ixgbe_clean_rx_irq+0x10d/0xb70 [ixgbe]
kernel: [26339.017564] [<ffffffffa0148198>] ? ixgbe_poll+0x488/0x860 [ixgbe]
kernel: [26339.017567] [<ffffffff8108c9ad>] ? hrtimer_get_next_event+0xad/0xc0
Rather than above call trace, below signature indicates the sign of page starvation.
kernel: [26339.017667] active_anon:2860787 inactive_anon:290478 isolated_anon:15723
kernel: [26339.017667] active_file:284318 inactive_file:151176 isolated_file:0
kernel: [26339.017667] unevictable:20736 dirty:24804 writeback:4297 unstable:0
kernel: [26339.017667] free:23079 slab_reclaimable:27293 slab_unreclaimable:86672
kernel: [26339.017667] mapped:22343 shmem:413 pagetables:10111 bounce:0
kernel: [26339.017667] free_cma:0
As the "inactive_anon:290478, inactive_file:151176" signature indicates, there a high possibility of DMA zone page starvation. If you refer to the following instruction, you find out whether our system is going through kernel memory leak.
diff --git a/arch/arm/configs/pompeii_defconfig b/arch/arm/configs/pompeii_defconfig index 2e97f97..aac678a 100644 --- a/arch/arm/configs/pompeii_defconfig +++ b/arch/arm/configs/pompeii_defconfig @@ -754,8 +754,8 @@ CONFIG_SLUB_DEBUG_PANIC_ON=y CONFIG_SLUB_DEBUG_ON=y CONFIG_DEBUG_KMEMLEAK=y -CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE=4000 -CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y +CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE=40000 +# CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF is not set CONFIG_DEBUG_STACK_USAGE=y CONFIG_DEBUG_VM=y CONFIG_DEBUG_MEMORY_INIT=y
Make sure to add kernel command line to "" kmemleak=on"".
After typing the following command for 10 mins period, echo scan > /sys/kernel/debug/kmemleak
The output of kernel memory leak can be displayed with below command. cat > /sys/kernel/debug/kmemleak
Upvotes: 1