e.ad
e.ad

Reputation: 380

Exact behaviour of --mno-unaligned-access

I'll start with the question, and will follow it by an example. The description of this flag in ARM Compiler armclang Reference Guide Version 6.4 (link) says:

If unaligned access is disabled, words in packed data structures are accessed one byte at a time.

As you can see in the following example, after the 1 byte access on line 1e0 there is (aligned) word access on line 1e2. By the above description I would expect that the form of access on 1e0 would be used to the rest of the bytes of M[1].A. I would like to ask for an exact description of the behavior with this flag set: does it always as in this example? meaning that over aligned addresses it will be able to extract words even on packed structs?

Example: for this code,

typedef struct __attribute__((packed, aligned(1))) MyStruct{
    int A;
    short B;
    char C;
} MyStruct_t;
int main(void) {
    MyStruct_t M[2];
    int D, E;
    M[0].A = 0xffffffff;
    M[1].A = 0xeeeeeeee;
    D = M[0].A;
    E = M[1].A;
    D = E;
    return 0 ;
}

compiled with --mno-unaligned-access and like that (using MCUXpresso ide):

arm-none-eabi-gcc -nostdlib -Xlinker -Map="m7_experiments.map" -Xlinker --cref -Xlinker --gc-sections -Xlinker -print-memory-usage -mcpu=cortex-m7 -mfpu=fpv5-sp-d16 -mfloat-abi=hard -mthumb -T "m7_experiments_Debug.ld" -o "m7_experiments.axf" $(OBJS) $(USER_OBJS) $(LIBS)

I'm getting the following machine code:

000001b0 <main>:
 1b0:   b480            push    {r7}
 1b2:   b087            sub     sp, #28
 1b4:   af00            add     r7, sp, #0
 1b6:   f04f 33ff       mov.w   r3, #4294967295 ; 0xffffffff
 1ba:   603b            str     r3, [r7, #0]
 1bc:   2300            movs    r3, #0
 1be:   f063 0311       orn     r3, r3, #17
 1c2:   71fb            strb    r3, [r7, #7]
 1c4:   2300            movs    r3, #0
 1c6:   f063 0311       orn     r3, r3, #17
 1ca:   723b            strb    r3, [r7, #8]
 1cc:   2300            movs    r3, #0
 1ce:   f063 0311       orn     r3, r3, #17
 1d2:   727b            strb    r3, [r7, #9]
 1d2:   727b            strb    r3, [r7, #9]
 1d4:   2300            movs    r3, #0
 1d6:   f063 0311       orn     r3, r3, #17
 1da:   72bb            strb    r3, [r7, #10]
 1dc:   683b            ldr     r3, [r7, #0]
 1de:   617b            str     r3, [r7, #20]
 1e0:   79fb            ldrb    r3, [r7, #7]
 1e2:   68ba            ldr     r2, [r7, #8]
 1e4:   f022 427f       bic.w   r2, r2, #4278190080     ; 0xff000000
 1e8:   0212            lsls    r2, r2, #8
 1ea:   4313            orrs    r3, r2
 1ec:   613b            str     r3, [r7, #16]
 1ee:   693b            ldr     r3, [r7, #16]
 1f0:   617b            str     r3, [r7, #20]
 1f2:   2300            movs    r3, #0
 1f4:   4618            mov     r0, r3
 1f6:   371c            adds    r7, #28
 1f8:   46bd            mov     sp, r7
 1fa:   f85d 7b04       ldr.w   r7, [sp], #4
 1fe:   4770            bx      lr

EDIT: with the complementary flag munaligned-access we receive what would be expected on this case:

000001b0 <main>:
 1b0:   b480            push    {r7}
 1b2:   b087            sub     sp, #28
 1b4:   af00            add     r7, sp, #0
 1b6:   f04f 33ff       mov.w   r3, #4294967295 ; 0xffffffff
 1ba:   603b            str     r3, [r7, #0]
 1bc:   2300            movs    r3, #0
 1be:   f063 0311       orn     r3, r3, #17
 1c2:   71fb            strb    r3, [r7, #7]
 1c4:   2300            movs    r3, #0
 1c6:   f063 0311       orn     r3, r3, #17
 1ca:   723b            strb    r3, [r7, #8]
 1cc:   2300            movs    r3, #0
 1ce:   f063 0311       orn     r3, r3, #17
 1d2:   727b            strb    r3, [r7, #9]
 1d4:   2300            movs    r3, #0
 1d6:   f063 0311       orn     r3, r3, #17
 1da:   72bb            strb    r3, [r7, #10]
 1dc:   683b            ldr     r3, [r7, #0]
 1de:   617b            str     r3, [r7, #20]
 1e0:   f8d7 3007       ldr.w   r3, [r7, #7]
 1e4:   613b            str     r3, [r7, #16]
 1e6:   693b            ldr     r3, [r7, #16]
 1e8:   617b            str     r3, [r7, #20]
 1ea:   2300            movs    r3, #0
 1ec:   4618            mov     r0, r3
 1ee:   371c            adds    r7, #28
 1f0:   46bd            mov     sp, r7
 1f2:   f85d 7b04       ldr.w   r7, [sp], #4
 1f6:   4770            bx      lr

Upvotes: 3

Views: 1248

Answers (2)

old_timer
old_timer

Reputation: 71506

Your struct is naturally aligned (and packed). Try this instead.

{
    unsigned char C;
    unsigned short B;
    unsigned int A;
}

GCC appears to default to byte accesses, and clang (the one from llvm, I don't know what armclang is, nor did I have it nor try it) defaults to unaligned accesses.

I found that gnu always did stores a byte at a time but loads varied based on the command line option. And clang the store and loads were based on the command line.

The quote from the link is already flawed because it only mentions words not halfwords nor double words (nor floats of any kind). In any case you are correct, if the item is aligned the flag does not force it to be broken into byte accesses, the statement does not match (llvm) clang. I do not know why one would want it to force byte accesses for aligned items as in your example. It would make sense to have a flag to avoid unaligned accesses (as the name of the flag implies) and keep aligned ones.

The quote is also badly written as it implies the software might know if unaligned accesses are disabled. There is no check in the code to see if the processor is set to block unaligned accesses.

You can contact ARM and see if you can get them to fix the web page.

The flag causes packed structs to not generate unaligned accesses.

This is an example of something already significantly better.

The flag causes access to packed structs to only generate aligned accesses.

This is another.

Upvotes: 0

Tom V
Tom V

Reputation: 5470

The behaviour here is because even though the type is packed and potentially misaligned, the compiler knows that any instance of it on the stack must be aligned, and so aligned members of it can be accessed using word sized reads and writes.

If you access the packed struct through a pointer then the compiler doesn't know its alignment, and so the behaviour is very different.

I have not been able to reproduce this exact behaviour on godbolt because it doesn't have your version of armclang, but look at this example compiled with gcc 11:

typedef struct __attribute__((packed, aligned(1))) MyStruct{
    int A;
    short B;
    char C;
} MyStruct_t;

int main(void) {
    MyStruct_t M[2];
    int D, E;
    M[0].A = 0xffffffff;
    M[1].A = 0xeeeeeeee;
    D = M[0].A;
    E = M[1].A;
    D = E;
    return 0 ;
}

int fn(MyStruct_t *M) {
    int D, E;
    M[0].A = 0xffffffff;
    M[1].A = 0xeeeeeeee;
    D = M[0].A;
    E = M[1].A;
    D = E;
    return 0 ;
}

The same lines which use a str in the first function use four strb in the second.

Upvotes: 0

Related Questions