Fake Name
Fake Name

Reputation: 5874

Overeager struct packing warnings with `__attribute__((packed))`?

I'm implementing a binary logging system on a 32 bit ARM mcu (Atmel SAM4SD32C, a Cortex-M4/ARMv7E-M part), and in the process of designing my data structures. My goal is to describe the log format as a packed struct, and simply union the struct with a char array, for writing to the log device (a SD card, via FatFS, in this case).

Basically, I have a very simple struct:

typedef struct adc_samples_t
{
    int32_t adc_samples[6];

    uint64_t acq_time;

    int8_t  overrun;
    uint8_t padding_1;
    uint8_t padding_2;
    uint8_t padding_3;

} __attribute__((packed, aligned(4))) adc_sample_set;

Now, my architecture is 32 bits, so as far as I understand, access to any member /other/ then the overrun member should be 32-bit aligned, and therefore not have an extra overhead. Furthermore, the aligned(4) attribute should force any instantiations of the struct to be on a 32-bit aligned boundary.

However, compiling the above struct definition produces a pile of warnings:

        In file included from ../src/main.c:13:0:
<snip>\src\fs\fs-logger.h(10,10): warning: packed attribute causes inefficient alignment for 'adc_samples' [-Wattributes]
          int32_t adc_samples[6];
                  ^
<snip>\src\fs\fs-logger.h(12,11): warning: packed attribute causes inefficient alignment for 'acq_time' [-Wattributes]
          uint64_t acq_time;

As far as I know (and I'm now realizing this is a big assumption), I assumed that 32-bit alignment was all that was needed for optimal component positioning on 32 bit arm. Oddly, the only member that does /not/ produce warnings are the overrun and padding_X members, which I don't understand the causes for. (Ok, the ARM docs say Byte accesses are always aligned.)

What, exactly, is going on here? I assume (possibly incorrectly) that the struct instantiation will be on 4 bytes boundaries. Does the compiler require a more broad alignment (on 8 byte boundaries)?


Edit: Ok, digging into the ARM docs (the magic words here were "Cortex-M4 alignment":

3.3.5. Address alignment

An aligned access is an operation where a word-aligned address is used for a word, dual word, or multiple word access, or where a halfword-aligned address is used for a halfword access. Byte accesses are always aligned.

The Cortex-M4 processor supports unaligned access only for the following instructions:

LDR, LDRT
LDRH, LDRHT
LDRSH, LDRSHT
STR, STRT
STRH, STRHT

All other load and store instructions generate a UsageFault exception if they perform an unaligned access, and therefore their accesses must be address aligned. For more information about UsageFaults see Fault handling.

Unaligned accesses are usually slower than aligned accesses. In addition, some memory regions might not support unaligned accesses. Therefore, ARM recommends that programmers ensure that accesses are aligned. To trap accidental generation of unaligned accesses, use the UNALIGN_TRP bit in the Configuration and Control Register, see Configuration and Control Register.

How is my 32-bit aligned value not word-aligned? The user guide defines "Aligned" as the following:

Aligned
A data item stored at an address that is divisible by the number of bytes that defines the data size is said to be aligned. Aligned words and halfwords have addresses that are divisible by four and two respectively. The terms word-aligned and halfword-aligned therefore stipulate addresses that are divisible by four and two respectively.

Upvotes: 5

Views: 4402

Answers (2)

Notlikethat
Notlikethat

Reputation: 20914

I assumed that 32-bit alignment was all that was needed for optimal component positioning on 32-bit ARM

It is.

But you don't have 32-bit alignment here [in the originally-asked question] because:

Specifying the packed attribute for struct and union types is equivalent to specifying the packed attribute on each of the structure or union members.

given that:

The packed attribute specifies that a variable or structure field should have the smallest possible alignment—one byte for a variable, and one bit for a field, unless you specify a larger value with the aligned attribute.

In other words, if you still want a packed structure to still have some minimum alignment after you've forced the alignment of all members, and thus the type itself, to nothing, you need to specify so - the fact that that might not actually make -Wpacked shut up is a different matter - GCC may well just spit that out reflexively before it actually considers any further alignment modifiers.

Note that in terms of serialisation, you don't necessarily need to pack it anyway. The members fit in 9 words exactly, so the only compiler padding anywhere is an extra word at the end to round the total size up to 40 bytes, since acq_time forces the struct to a natural alignment of 8. Unless you want to operate on a whole array of these things at once, you can get away with simply ignoring that and still treating the members as one 36-byte chunk.

Upvotes: 4

Fake Name
Fake Name

Reputation: 5874

Ok, at this point, I'm somewhat confident that the warning is being emitted in error.

I have a statically defined instance of the struct, and at one point I zero it:

adc_sample_set running_average;
int accumulated_samples;

inline void zero_average_buf(void)
{

    accumulated_samples = 0;

    running_average.adc_samples[0] = 0;
    running_average.adc_samples[1] = 0;
    running_average.adc_samples[2] = 0;
    running_average.adc_samples[3] = 0;
    running_average.adc_samples[4] = 0;
    running_average.adc_samples[5] = 0;

    running_average.overrun = 0;

    running_average.acq_time = 0;

}

The disassembly for the function is the follows:

{
004005F8   push {r3, lr}         
    accumulated_samples = 0;
004005FA   movs r2, #0       
004005FC   ldr  r3, [pc, #36]        
004005FE   str  r2, [r3]         
    running_average.adc_samples[0] = 0;
00400600   ldr  r3, [pc, #36]        
00400602   str  r2, [r3]         
    running_average.adc_samples[1] = 0;
00400604   str  r2, [r3, #4]         
    running_average.adc_samples[2] = 0;
00400606   str  r2, [r3, #8]         
    running_average.adc_samples[3] = 0;
00400608   str  r2, [r3, #12]        
    running_average.adc_samples[4] = 0;
0040060A   str  r2, [r3, #16]        
    running_average.adc_samples[5] = 0;
0040060C   str  r2, [r3, #20]        
    running_average.overrun = 0;
0040060E   strb.w   r2, [r3, #32]        
    running_average.acq_time = 0;
00400612   movs r0, #0       
00400614   movs r1, #0       
00400616   strd r0, r1, [r3, #24]        

Note that r3 in the above is 0x2001ef70, which is indeed 4-byte aligned. r2 is the literal value 0.

The str opcode requires 4-byte alignment. The strd opcode only requires 4 byte alignment as well, since it appears to really be two sequential 4-byte operations, though I don't know how it actually works internally.


If I intentionally mis-align my struct, to force the slow-path copy operation:

typedef struct adc_samples_t
{
    int8_t  overrun;
    int32_t adc_samples[6];

    uint64_t acq_time;


    uint8_t padding_1;
    uint8_t padding_2;
    uint8_t padding_3;

} __attribute__((packed, aligned(8))) adc_sample_set;

I get the following assembly:

{
00400658   push {r3, lr}         
    accumulated_samples = 0;
0040065A   movs r3, #0       
0040065C   ldr  r2, [pc, #84]        
0040065E   str  r3, [r2]         
    running_average.adc_samples[0] = 0;
00400660   ldr  r2, [pc, #84]        
00400662   strb r3, [r2, #1]         
00400664   strb r3, [r2, #2]         
00400666   strb r3, [r2, #3]         
00400668   strb r3, [r2, #4]         
    running_average.adc_samples[1] = 0;
0040066A   strb r3, [r2, #5]         
0040066C   strb r3, [r2, #6]         
0040066E   strb r3, [r2, #7]         
00400670   strb r3, [r2, #8]         
    running_average.adc_samples[2] = 0;
00400672   strb r3, [r2, #9]         
00400674   strb r3, [r2, #10]        
00400676   strb r3, [r2, #11]        
00400678   strb r3, [r2, #12]        
    running_average.adc_samples[3] = 0;
0040067A   strb r3, [r2, #13]        
0040067C   strb r3, [r2, #14]        
0040067E   strb r3, [r2, #15]        
00400680   strb r3, [r2, #16]        
    running_average.adc_samples[4] = 0;
00400682   strb r3, [r2, #17]        
00400684   strb r3, [r2, #18]        
00400686   strb r3, [r2, #19]        
00400688   strb r3, [r2, #20]        
    running_average.adc_samples[5] = 0;
0040068A   strb r3, [r2, #21]        
0040068C   strb r3, [r2, #22]        
0040068E   strb r3, [r2, #23]        
00400690   strb r3, [r2, #24]        
    running_average.overrun = 0;
00400692   mov  r1, r2       
00400694   strb r3, [r1], #25        
    running_average.acq_time = 0;
00400698   strb r3, [r2, #25]        
0040069A   strb r3, [r1, #1]         
0040069C   strb r3, [r1, #2]         
0040069E   strb r3, [r1, #3]         
004006A0   strb r3, [r1, #4]         
004006A2   strb r3, [r1, #5]         
004006A4   strb r3, [r1, #6]         
004006A6   strb r3, [r1, #7]    

So, pretty clearly, I'm getting the proper aligned-copy behaviour with my original struct definition, despite the compiler apparently incorrectly warning that it will result in inefficient accesses.

Upvotes: 3

Related Questions