How does this disassembly correspond to the given C code?

Question

Environment: GCC 4.7.3 (arm-none-eabi-gcc) for ARM Cortex m4f. Bare-metal (actually MQX RTOS, but here that's irrelevant). The CPU is in Thumb state.

Here's a disassembler listing of some code I'm looking at:

//.label flash_command
// ...
while(!(FTFE_FSTAT & FTFE_FSTAT_CCIF_MASK)) {}
// Compiles to:
12: bf00        nop
14: f04f 0300   mov.w   r3, #0
18: f2c4 0302   movt    r3, #16386  ; 0x4002
1c: 781b        ldrb    r3, [r3, #0]
1e: b2db        uxtb    r3, r3
20: b2db        uxtb    r3, r3
22: b25b        sxtb    r3, r3
24: 2b00        cmp r3, #0
26: daf5        bge.n   14

The constants (after expending macros, etc.) are:

address of FTFE_FSTAT is 0x40020000u
FTFE_FSTAT_CCIF_MASK is 0x80u

This is compiled with NO optimization (-O0), so GCC shouldn't be doing anything fancy... and yet, I don't get this code. Post-answer edit: Never assume this. My problem was getting a false sense of security from turning off optimization.

I've read that "uxtb r3,r3" is a common way of truncating a 32-bit value. Why would you want to truncate it twice and then sign-extend? And how in the world is this equivalent to the bit-masking operation in the C-code?

What am I missing here?

Edit: Types of the thing involved: So the actual macro expansion of FTFE_FSTAT comes down to

((((FTFE_MemMapPtr)0x40020000u))->FSTAT)

where the struct is defined as

/** FTFE - Peripheral register structure */
typedef struct FTFE_MemMap {
    uint8_t FSTAT; /**< Flash Status Register, offset: 0x0 */
    uint8_t FCNFG; /**< Flash Configuration Register, offset: 0x1 */
    //... a bunch of other uint_8
} volatile *FTFE_MemMapPtr;

user3386109 · Accepted Answer

The two uxtb instructions are the compiler being stupid, they should be optimized out if you turn on optimization. The sxtb is the compiler being brilliant, using a trick that you wouldn't expect in unoptimized code.

The first uxtb is due to the fact that you loaded a byte from memory. The compiler is zeroing the other 24 bits of register r3, so that the byte value fills the entire register.

The second uxtb is due to the fact that you're ANDing with an 8-bit value. The compiler realizes that the upper 24-bits of the result will always be zero, so it's using uxtb to clear the upper 24-bits.

Neither of the uxtb instructions does anything useful, because the sxtb instruction overwrites the upper 24 bits of r3 anyways. The optimizer should realize that and remove them when you compile with optimizations enabled.

The sxtb instruction takes the one bit you care about 0x80 and moves it into the sign bit of register r3. That way, if bit 0x80 is set, then r3 becomes a negative number. So now the compiler can compare with 0 to determine whether the bit was set. If the bit was not set then the bge instruction branches back to the top of the while loop.

How does this disassembly correspond to the given C code?

Answers (1)

Related Questions