Reputation: 8268
I am using vanilla Linux kernel v6.5 on qemu. I have developed a dummy kernel module with below code, with an intention to trigger softlockup:
static int __init soft_lockup_init(void) {
printk(KERN_INFO "Soft lockup module loaded\n");
// Infinite loop to trigger soft lockup
while (1) {
// To ensure that the loop isn't optimized out by the compiler
barrier();
}
return 0;
}
I have enabled CONFIG_SOFTLOCKUP_DETECTOR=y
and CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y
. On qemu x86_64 system, when I load the above kernel module, I do see panic due to softlockup:
watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [insmod:102]
and the calltrace also looks as expected:
[ 60.972181] Call Trace:
[ 60.972181] <IRQ>
[ 60.972181] ? watchdog_timer_fn+0x1ad/0x210
[ 60.972181] ? __pfx_watchdog_timer_fn+0x10/0x10
[ 60.972181] ? __hrtimer_run_queues+0x125/0x2d0
[ 60.972181] ? hrtimer_interrupt+0xfb/0x240
[ 60.972181] ? __sysvec_apic_timer_interrupt+0x5e/0x130
[ 60.972181] ? sysvec_apic_timer_interrupt+0x69/0x90
[ 60.972181] </IRQ>
But when I load the same kernel module on qemu aarch64, nothing happens, no matter how long I wait. There is no softlockup detected. Hardlockup detector, on the other hand, works on both x86_64 and arm64. Since softlockup detector also uses the same watchdog_time_fn()
as hardlockup, I expected it to work fine on arm64 as well. Is this a bug?
Another question is why does a simple infinite loop in module_init()
results in a softlockup? I thought that it can be preempted and other tasks could still be able to use the CPU core where insmod was done? Is that not correct?
EDIT
I attached a GDB to both x86_64 and arm64 qemu instances. In case of x86_64, I can see that once I do insmod
(of my faulty module with infinite loop), the core which serviced this insmod remains stuck there forever. This can be confirmed by the stack backtrace:
>>> bt
#0 0xffffffffc0006020 in ?? ()
#1 0xffffffff81001a33 in do_one_initcall (fn=0xffffffffc0006010) at init/main.c:1232
#2 0xffffffff8112a03f in do_init_module (mod=0xffffffffc0002040) at kernel/module/main.c:2530
#3 0xffffffff8112c07b in load_module (info=0xffffffff8200148a <asm_sysvec_apic_timer_interrupt+26>, info@entry=0xffffc900001fbde8, uargs=0x0 <fixed_percpu_data>, uargs@entry=0x55c5e26e72a0 "", flags=17617392, flags@entry=0) at kernel/module/main.c:2981
#4 0xffffffff8112c556 in init_module_from_file (f=f@entry=0xffff88810143b300, uargs=uargs@entry=0x55c5e26e72a0 "", flags=flags@entry=0) at kernel/module/main.c:3148
#5 0xffffffff8112c72c in idempotent_init_module (f=f@entry=0xffff88810143b300, uargs=uargs@entry=0x55c5e26e72a0 "", flags=flags@entry=0) at kernel/module/main.c:3165
#6 0xffffffff8112c8a6 in __do_sys_finit_module (flags=0, uargs=0x55c5e26e72a0 "", fd=<optimized out>) at kernel/module/main.c:3186
#7 __se_sys_finit_module (flags=0, uargs=94308395807392, fd=<optimized out>) at kernel/module/main.c:3169
#8 __x64_sys_finit_module (regs=<optimized out>) at kernel/module/main.c:3169
#9 0xffffffff81eacf4f in do_syscall_x64 (nr=<optimized out>, regs=0xffffc900001fbf58) at arch/x86/entry/common.c:50
#10 do_syscall_64 (regs=0xffffc900001fbf58, nr=<optimized out>) at arch/x86/entry/common.c:80
#11 0xffffffff820000ea in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:120
#12 0x00007f1cc46ff000 in ?? ()
#13 0x00007ffe84e5a590 in ?? ()
#14 0x00007f1cc46feac0 in ?? ()
#15 0x00007ffe84e5af56 in ?? ()
#16 0x0000000000000003 in fixed_percpu_data ()
#17 0x000055c5e26e72a0 in ?? ()
#18 0x0000000000000246 in ?? ()
#19 0x0000000000000000 in ?? ()
This sets a perfect condition to result in a softlockup.
On the other hand, on arm64, I do not see this happening. All the cores in arm64 are regularly servicing my softlockup_fn()
that prevents detection of softlockup. Does this mean that the do_init_module()
behaves differently on these architectures?
Upvotes: 0
Views: 274