Jackoo
Jackoo

Reputation: 309

What is the reason of this data abort in ZYNQ 7000 (armA9)?

I'm using ZYNQ 7000 SoC which has 2 arm A9 cores, core0 and core1. Sometimes a data abort happens in my core1 code (bare metal). At the default data abort handler Xil_DataAbortHandler, it says the FaultStatus is 0x1e, and the DataAbortAddr is 0x2001bc9c.

I use readelf -s a.elf to get the symbol table, and find no function address match the exact 0x2001bc9c. The closest function is Xil_L2CacheDisable at address 2001bc64. Does this mean that the data abort is from Xil_L2CacheDisable? This function is a bsp library function provided by Xilinx. I'm using it for direct access to the shared memory for the 2 cores.

The global variable u32 DataAbortAddr is captured by the following assembly:

DataAbortHandler:               /* Data Abort handler */
#ifdef CONFIG_ARM_ERRATA_775420
    dsb
#endif
    stmdb   sp!,{r0-r3,r12,lr}      /* state save from compiled code */
    ldr     r0, =DataAbortAddr
    sub     r1, lr, #8
    str     r1, [r0]                    /* Stores instruction causing data abort */

    bl  DataAbortInterrupt      /*DataAbortInterrupt :call C function here */

    ldmia   sp!,{r0-r3,r12,lr}      /* state restore from compiled code */

    subs    pc, lr, #8          /* points to the instruction that caused the Data Abort exception */

The FaultStatus is captured in DataAbortInterrupt

#define mfcp(rn)    ({u32 rval = 0U; \
             __asm__ __volatile__(\
               "mrc " rn "\n"\
               : "=r" (rval)\
             );\
             rval;\
             })
#endif
#define XREG_CP15_DATA_FAULT_STATUS     "cp15:0:c5:c0:0"

u32 FaultStatus = mfcp(XREG_CP15_DATA_FAULT_STATUS);

Upvotes: 0

Views: 1160

Answers (1)

user3124812
user3124812

Reputation: 1986

ZYNQ 7000 SoC is based on ARMv7, all retrieved from ARMv7 reference manual.

"cp15:0:c5:c0:0" is reading DFSR, Data Fault Status Register.
Value 0x1E is 'Fault Status' bits in DFSR register (DFSR Link)
And in this case means "Synchronous parity error on translation table walk, Second level" (if you using short-descriptor translation table format) (FSR encodings).
Which is likely ECC throwing exception due to bad memory chip.

It's not clear how DataAbortAddr is acquired. But I'm quite confident that's data memory address, not instruction address. Or by other words exception occurs when some instruction is reading data word at address 0x2001bc9c.

In order to get instruction address you would need backtrace PC/LR registers to the point where exception happens.

Also I'd recommend to use objdump instead of readelf tool to search for instructions.

Upvotes: 1

Related Questions