Reputation: 9173
Yesterday, I found a catastrophic problem in clang
when I was trying to compile a code for arm (in android arm-v7a at least). See this small code:
void init_c_32(uint8_t *ptr)
{
uint32_t tmp[SIZE];
memcpy(tmp, ptr, 33);
}
here is the generated assembly code for calling memcpy
here:
0x7903d714 <+20>: ldr r0, [sp, #0x10]
0x7903d716 <+22>: add r3, sp, #0x14
0x7903d718 <+24>: mov.w r12, #0x20
0x7903d71c <+28>: str r0, [sp, #0xc]
0x7903d71e <+30>: mov r0, r3
0x7903d720 <+32>: ldr r3, [sp, #0xc]
0x7903d722 <+34>: str r1, [sp, #0x8]
0x7903d724 <+36>: mov r1, r3
0x7903d726 <+38>: str r2, [sp, #0x4]
0x7903d728 <+40>: mov r2, r12
0x7903d72a <+42>: blx 0x7903d658 ; symbol stub for: __aeabi_memcpy
which uses __aeabi_memcpy
and everything will be ok for any ptr
address. Now if we change argument type to uint32_t *
, generated assembly code will change as follows:
void init_c_32(uint32_t *ptr)
{
uint32_t tmp[SIZE];
memcpy(tmp, ptr, 33);
}
0x790456dc <+20>: ldr r0, [sp, #0x8]
0x790456de <+22>: add r3, sp, #0xc
0x790456e0 <+24>: ldm.w r0!, {r4, r5, r12, lr}
0x790456e4 <+28>: stm.w r3!, {r4, r5, r12, lr}
0x790456e8 <+32>: ldm.w r0, {r4, r5, r12, lr}
0x790456ec <+36>: stm.w r3, {r4, r5, r12, lr}
This code is optimized a lot and uses ldm.w
and stm.w
rather than memcpy
. Result is a much quicker code, but there is a drawback. This code will not work correctly with odd ptr
addresses and creates SIGBUS
exception which is correct based on generated assembly code. .w
addressing limits addressing model to even values, but maybe we can say that this is by design because we have defined argument as unit32_t *
and we say that this argument must be aligned.
But main problem happens here. Check following code:
void init_c_32(__packed uint32_t *ptr)
{
uint32_t tmp[SIZE];
memcpy(tmp, ptr, 33);
}
as you see, event though we have specified uint32_t *
as input parameter, we have used __packed
specifier. As standard specifies, __packed
says that:
objects of packed type are read or written using unaligned accesses.
But when we see generated assembly code, we see the following:
0x78ec56dc <+20>: ldr r0, [sp, #0x8]
0x78ec56de <+22>: add r3, sp, #0xc
0x78ec56e0 <+24>: ldm.w r0!, {r4, r5, r12, lr}
0x78ec56e4 <+28>: stm.w r3!, {r4, r5, r12, lr}
0x78ec56e8 <+32>: ldm.w r0, {r4, r5, r12, lr}
0x78ec56ec <+36>: stm.w r3, {r4, r5, r12, lr}
As you see, generated code does not differ with a non __packed
mode and this conflicts with ARM
standard. You still cannot use odd addresses for referencing and you will get SIGBUS
exception. I think in this case generated code should be similar to when we use uint8_t *
as argument.
I think this is a very serious bug and can created unexpected results and any good solution is welcomed.
I have used ndk 16 for this creating this problem which uses clang 5.0.3
as its compiler.
Current workaround is using uint8_t *
as input all the time which creates correct code. But efficiency-wise, it will be better if this problem is solved.
Upvotes: 1
Views: 577
Reputation: 57173
FWIW, clang, unlike the ARM C compiler, does not allow __packed
pointers. For clang, __packed
is a synonym for __attribute__((__packed__))
which only applies to enum, struct, or union: http://gcc.gnu.org/onlinedocs/gcc-3.3/gcc/Type-Attributes.html.
Upvotes: 4