Afshin
Afshin

Reputation: 9173

Serious clang bug for compiling arm code

Yesterday, I found a catastrophic problem in clang when I was trying to compile a code for arm (in android arm-v7a at least). See this small code:

void init_c_32(uint8_t *ptr)
{
    uint32_t tmp[SIZE];
    memcpy(tmp, ptr, 33);
}

here is the generated assembly code for calling memcpy here:

0x7903d714 <+20>: ldr    r0, [sp, #0x10]
0x7903d716 <+22>: add    r3, sp, #0x14
0x7903d718 <+24>: mov.w  r12, #0x20
0x7903d71c <+28>: str    r0, [sp, #0xc]
0x7903d71e <+30>: mov    r0, r3
0x7903d720 <+32>: ldr    r3, [sp, #0xc]
0x7903d722 <+34>: str    r1, [sp, #0x8]
0x7903d724 <+36>: mov    r1, r3
0x7903d726 <+38>: str    r2, [sp, #0x4]
0x7903d728 <+40>: mov    r2, r12
0x7903d72a <+42>: blx    0x7903d658                ; symbol stub for: __aeabi_memcpy

which uses __aeabi_memcpy and everything will be ok for any ptr address. Now if we change argument type to uint32_t *, generated assembly code will change as follows:

void init_c_32(uint32_t *ptr)
{
    uint32_t tmp[SIZE];
    memcpy(tmp, ptr, 33);
}

0x790456dc <+20>: ldr    r0, [sp, #0x8]
0x790456de <+22>: add    r3, sp, #0xc
0x790456e0 <+24>: ldm.w  r0!, {r4, r5, r12, lr}
0x790456e4 <+28>: stm.w  r3!, {r4, r5, r12, lr}
0x790456e8 <+32>: ldm.w  r0, {r4, r5, r12, lr}
0x790456ec <+36>: stm.w  r3, {r4, r5, r12, lr}

This code is optimized a lot and uses ldm.w and stm.w rather than memcpy. Result is a much quicker code, but there is a drawback. This code will not work correctly with odd ptr addresses and creates SIGBUS exception which is correct based on generated assembly code. .w addressing limits addressing model to even values, but maybe we can say that this is by design because we have defined argument as unit32_t * and we say that this argument must be aligned.

But main problem happens here. Check following code:

void init_c_32(__packed uint32_t *ptr)
{
    uint32_t tmp[SIZE];
    memcpy(tmp, ptr, 33);
}

as you see, event though we have specified uint32_t * as input parameter, we have used __packed specifier. As standard specifies, __packed says that:

objects of packed type are read or written using unaligned accesses.

But when we see generated assembly code, we see the following:

0x78ec56dc <+20>: ldr    r0, [sp, #0x8]
0x78ec56de <+22>: add    r3, sp, #0xc
0x78ec56e0 <+24>: ldm.w  r0!, {r4, r5, r12, lr}
0x78ec56e4 <+28>: stm.w  r3!, {r4, r5, r12, lr}
0x78ec56e8 <+32>: ldm.w  r0, {r4, r5, r12, lr}
0x78ec56ec <+36>: stm.w  r3, {r4, r5, r12, lr}

As you see, generated code does not differ with a non __packed mode and this conflicts with ARM standard. You still cannot use odd addresses for referencing and you will get SIGBUS exception. I think in this case generated code should be similar to when we use uint8_t * as argument.

I think this is a very serious bug and can created unexpected results and any good solution is welcomed.

I have used ndk 16 for this creating this problem which uses clang 5.0.3 as its compiler.

Current workaround is using uint8_t * as input all the time which creates correct code. But efficiency-wise, it will be better if this problem is solved.

Upvotes: 1

Views: 577

Answers (1)

Alex Cohn
Alex Cohn

Reputation: 57173

FWIW, clang, unlike the ARM C compiler, does not allow __packed pointers. For clang, __packed is a synonym for __attribute__((__packed__)) which only applies to enum, struct, or union: http://gcc.gnu.org/onlinedocs/gcc-3.3/gcc/Type-Attributes.html.

Upvotes: 4

Related Questions