Reputation: 1378
Environment: GCC 4.7.3 (arm-none-eabi-gcc) for ARM Cortex m4f. Bare-metal (actually MQX RTOS, but here that's irrelevant). The CPU is in Thumb state.
Here's a disassembler listing of some code I'm looking at:
//.label flash_command
// ...
while(!(FTFE_FSTAT & FTFE_FSTAT_CCIF_MASK)) {}
// Compiles to:
12: bf00 nop
14: f04f 0300 mov.w r3, #0
18: f2c4 0302 movt r3, #16386 ; 0x4002
1c: 781b ldrb r3, [r3, #0]
1e: b2db uxtb r3, r3
20: b2db uxtb r3, r3
22: b25b sxtb r3, r3
24: 2b00 cmp r3, #0
26: daf5 bge.n 14 <flash_command+0x14>
The constants (after expending macros, etc.) are:
address of FTFE_FSTAT is 0x40020000u
FTFE_FSTAT_CCIF_MASK is 0x80u
This is compiled with NO optimization (-O0), so GCC shouldn't be doing anything fancy... and yet, I don't get this code. Post-answer edit: Never assume this. My problem was getting a false sense of security from turning off optimization.
I've read that "uxtb r3,r3" is a common way of truncating a 32-bit value. Why would you want to truncate it twice and then sign-extend? And how in the world is this equivalent to the bit-masking operation in the C-code?
What am I missing here?
Edit: Types of the thing involved: So the actual macro expansion of FTFE_FSTAT comes down to
((((FTFE_MemMapPtr)0x40020000u))->FSTAT)
where the struct is defined as
/** FTFE - Peripheral register structure */
typedef struct FTFE_MemMap {
uint8_t FSTAT; /**< Flash Status Register, offset: 0x0 */
uint8_t FCNFG; /**< Flash Configuration Register, offset: 0x1 */
//... a bunch of other uint_8
} volatile *FTFE_MemMapPtr;
Upvotes: 2
Views: 1591
Reputation: 34829
The two uxtb
instructions are the compiler being stupid, they should be optimized out if you turn on optimization. The sxtb
is the compiler being brilliant, using a trick that you wouldn't expect in unoptimized code.
The first uxtb
is due to the fact that you loaded a byte from memory. The compiler is zeroing the other 24 bits of register r3, so that the byte value fills the entire register.
The second uxtb
is due to the fact that you're ANDing with an 8-bit value. The compiler realizes that the upper 24-bits of the result will always be zero, so it's using uxtb
to clear the upper 24-bits.
Neither of the uxtb
instructions does anything useful, because the sxtb
instruction overwrites the upper 24 bits of r3
anyways. The optimizer should realize that and remove them when you compile with optimizations enabled.
The sxtb
instruction takes the one bit you care about 0x80
and moves it into the sign bit of register r3
. That way, if bit 0x80
is set, then r3
becomes a negative number. So now the compiler can compare with 0
to determine whether the bit was set. If the bit was not set then the bge
instruction branches back to the top of the while
loop.
Upvotes: 7