Reputation: 6613
Can you enable interrupts in page fault handler? Is there an ARM kernel contention with preemptive scheduling?
I got an ARM kernel oops in UDP receiving code with CONFIG_PREEMPT, or when interrupt is enabled in fault handler.
The problem is similar to what another user reported here. But in my case when I send 110% load UDP packets to the system (system drops about 10% packets), kernel oops in a few minutes. This happens only if there are some busybox shell scripts running, not if only the UDP receiving program is running. I've tracked the data addresses it always looks good, the buffer was allocated and used before it is freed.
There are two ways to avoid it:
[1] When changing scheduling from preempt (CONFIG_PREEMPT) to preempt_voluntary, the problem goes away. Is this a known issue with ARM on kernel 2.6.39? With preempt scheduling I also see problem in jffs2 after a long while, but not with preempt_voluntary.
For a moment I suspected it is the Ethernet DMA fully utilized the bus thus blocking CPU from loading its TLB entry thus causing page fault. I'm deducing because busybox scripts need to be in the picture, when a script is spawned it creates address space and load many TLB entries thus overloading the bus. If preempt_voluntary is a solution, can DMA blocking bus be ruled out?
The test I'm running is a LTIB kernel 2.6.39.4 lpclinux on a phy3250 based system.
[2] Some more tests showed that the page fault handler is nested by Ethernet interrupts. When disabling interrupts in the kernel page fault handler __dabt_svc, but keep it enabled in the user page fault handler __dabt_user, the problem goes away. If not, the nest level goes up to 4 and it oops'ed. So the question is: Is enabling interrupts in page fault handler correct?
The test code for [2] goes below. Lines with @@@@ are added or modified. Then capture the nesting level in do_DataAbort().
file arch/arm/kernel/entry-armv.S:
__dabt_svc:
svc_entry
... ...
@
@ set desired IRQ state, then call main handler
@
debug_entry r1
@@@@Not_Enable_Irq_In_Dabtsvc
ldr r2, =armv_dabtsvc_count @@@@
ldr r3, [r2] @@@@
add r3, r3, #1 @@@@
str r3, [r2] @@@@
msr cpsr_c, r9 @@@@disable thisk
mov r2, r2 @@@@add this extra inst
mov r2, sp
bl do_DataAbort
@
@ IRQs off again before pulling preserved data off the stack
@
disable_irq_notrace
ldr r2, =armv_dabtsvc_count @@@@
ldr r3, [r2] @@@@
sub r3, r3, #1 @@@@
str r3, [r2] @@@@
@
@ restore SPSR and restart the instruction
@
ldr r2, [sp, #S_PSR]
svc_exit r2 @ return from exception
UNWIND(.fnend )
ENDPROC(__dabt_svc)
And add the variable to the file too:
file arch/arm/kernel/entry-armv.S:
@@@@save nesting level:
.data @@@@
.align @@@@
armv_dabtsvc_count: @@@@
.long 0 @ count svc entry @@@@
I'm trying to link all these up. Can kernel experts see whether all the tests make sense? Is disabling interrupts in page fault handler is a valid solution?
Edit: The oops in page fault handler is not the first failure. There was a "do_bad_area" in a proceeding alignment handler. Subsequently that failed fixup to unaligned access caused the page fault. Yes as someone commented below, fixing unaligned access is very troublesome. Those unaligned accesses are from ip_input, ip_fragment, and udp stack. Once I fixed all those in the stack, the problem is gone.
Edit again: The problem is with two operations in alignment handler: It fetches the instruction, and fetches data the instruction refers to. The oops is reported by data access, but the cause is fetching instruction failed with a first page fault failure. Since the fetch instruction is in kernel space, the page is always valid, that indicates a silicon bug. If change the code to fetch again it would succeed, that confirms it is more likely a silicon bug. Interrupt gets into the picture because of excess TLB flushing it brings in. For short, TLB loading is automatic thus fetching instruction in kernel space cannot fail. But still it failed.
Upvotes: 1
Views: 2905
Reputation: 6613
I guess this is the answer (incomplete, to be tested):
There is a problem when enabling interrupt too early. The __get_user() is assumed to be used in atomic context when it is used with interrupt enabled in do_alignment(). If the interrupt-enabling is deferred to after that point, everything should be ok.
Please look into two kernel commits. The first one on Jun 25 2011, that defers interrupt-enabling. The second one on Feb 25 2013 which changes uses of __get_user() to probling_kernel_address().
The first commit:
The 3.x kernel removed interrupt-enabling in low-level handlers __dabt_svc and __dabt_user etc. The commit message:
git diff 8b418616..02fe2845 entry-armv.S
commit 02fe2845d6a837ab02f0738f6cf4591a02cc88d4
Author: Russell King <[email protected]>
Date: Sat Jun 25 11:44:06 2011 +0100
ARM: entry: avoid enabling interrupts in prefetch/data abort handlers
Avoid enabling interrupts if the parent context had interrupts enabled
in the abort handler assembly code, and move this into the breakpoint/
page/alignment fault handlers instead.
This gets rid of some special-casing for the breakpoint fault handlers
from the low level abort handler path.
Acked-by: Will Deacon <[email protected]>
Signed-off-by: Russell King <[email protected]>
commit 8b4186160b7894ca4583f702a562856d5d9e9118
Author: Russell King <[email protected]>
Date: Sat Jun 25 19:25:02 2011 +0100
And the code diff snippet:
diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
index d644d02..c46bafa 100644
--- a/arch/arm/kernel/entry-armv.S
+++ b/arch/arm/kernel/entry-armv.S
@@ -185,20 +185,15 @@ ENDPROC(__und_invalid)
__dabt_svc:
svc_entry
... ...
dabt_helper
@
- @ set desired IRQ state, then call main handler
+ @ call main handler
@
- debug_entry r1
- msr cpsr_c, r9
mov r2, sp
bl do_DataAbort
......
That confirms interrupts do not need to be enabled too early in fault handlers.
The second commit:
commit b255188f90e2bade1bd11a986dd1ca4861869f4d
Author: Russell King <[email protected]>
Date: Mon Feb 25 16:10:42 2013 +0000
ARM: fix scheduling while atomic warning in alignment handling code
Paolo Pisati reports that IPv6 triggers this warning:
BUG: scheduling while atomic: swapper/0/0/0x40000100
[<c001b1c4>] (unwind_backtrace+0x0/0xf0) from [<c0503c5c>] (__schedule_bug+0x48/0x5c)
[<c0503c5c>] (__schedule_bug+0x48/0x5c) from [<c0508608>] (__schedule+0x700/0x740)
[<c0508608>] (__schedule+0x700/0x740) from [<c007007c>] (__cond_resched+0x24/0x34)
[<c007007c>] (__cond_resched+0x24/0x34) from [<c05086dc>] (_cond_resched+0x3c/0x44)
[<c05086dc>] (_cond_resched+0x3c/0x44) from [<c0021f6c>] (do_alignment+0x178/0x78c)
[<c0021f6c>] (do_alignment+0x178/0x78c) from [<c00083e0>] (do_DataAbort+0x34/0x98)
[<c00083e0>] (do_DataAbort+0x34/0x98) from [<c0509a60>] (__dabt_svc+0x40/0x60)
Exception stack(0xc0763d70 to 0xc0763db8)
[<c0509a60>] (__dabt_svc+0x40/0x60) from [<c02a8490>] (__csum_ipv6_magic+0x8/0xc8)
Fix this by using probe_kernel_address() stead of __get_user().
arch/arm/mm/alignment.c | 11 ++++-------
Upvotes: 1