Reputation: 5874
I'm implementing a binary logging system on a 32 bit ARM mcu (Atmel SAM4SD32C, a Cortex-M4/ARMv7E-M part), and in the process of designing my data structures. My goal is to describe the log format as a packed struct, and simply union the struct with a char array, for writing to the log device (a SD card, via FatFS, in this case).
Basically, I have a very simple struct:
typedef struct adc_samples_t
{
int32_t adc_samples[6];
uint64_t acq_time;
int8_t overrun;
uint8_t padding_1;
uint8_t padding_2;
uint8_t padding_3;
} __attribute__((packed, aligned(4))) adc_sample_set;
Now, my architecture is 32 bits, so as far as I understand, access to any member /other/ then the overrun
member should be 32-bit aligned, and therefore not have an extra overhead. Furthermore, the aligned(4)
attribute should force any instantiations of the struct to be on a 32-bit aligned boundary.
However, compiling the above struct definition produces a pile of warnings:
In file included from ../src/main.c:13:0:
<snip>\src\fs\fs-logger.h(10,10): warning: packed attribute causes inefficient alignment for 'adc_samples' [-Wattributes]
int32_t adc_samples[6];
^
<snip>\src\fs\fs-logger.h(12,11): warning: packed attribute causes inefficient alignment for 'acq_time' [-Wattributes]
uint64_t acq_time;
As far as I know (and I'm now realizing this is a big assumption), I assumed that 32-bit alignment was all that was needed for optimal component positioning on 32 bit arm. Oddly, the only member that does /not/ produce warnings are the (Ok, the ARM docs say overrun
and padding_X
members, which I don't understand the causes for.Byte accesses are always aligned.
)
What, exactly, is going on here? I assume (possibly incorrectly) that the struct instantiation will be on 4 bytes boundaries. Does the compiler require a more broad alignment (on 8 byte boundaries)?
Edit: Ok, digging into the ARM docs (the magic words here were "Cortex-M4 alignment":
3.3.5. Address alignment
An aligned access is an operation where a word-aligned address is used for a word, dual word, or multiple word access, or where a halfword-aligned address is used for a halfword access. Byte accesses are always aligned.
The Cortex-M4 processor supports unaligned access only for the following instructions:
LDR, LDRT
LDRH, LDRHT
LDRSH, LDRSHT
STR, STRT
STRH, STRHTAll other load and store instructions generate a UsageFault exception if they perform an unaligned access, and therefore their accesses must be address aligned. For more information about UsageFaults see Fault handling.
Unaligned accesses are usually slower than aligned accesses. In addition, some memory regions might not support unaligned accesses. Therefore, ARM recommends that programmers ensure that accesses are aligned. To trap accidental generation of unaligned accesses, use the UNALIGN_TRP bit in the Configuration and Control Register, see Configuration and Control Register.
How is my 32-bit aligned value not word-aligned? The user guide defines "Aligned" as the following:
Aligned
A data item stored at an address that is divisible by the number of bytes that defines the data size is said to be aligned. Aligned words and halfwords have addresses that are divisible by four and two respectively. The terms word-aligned and halfword-aligned therefore stipulate addresses that are divisible by four and two respectively.
Upvotes: 5
Views: 4402
Reputation: 20914
I assumed that 32-bit alignment was all that was needed for optimal component positioning on 32-bit ARM
It is.
But you don't have 32-bit alignment here [in the originally-asked question] because:
Specifying the
packed
attribute forstruct
andunion
types is equivalent to specifying thepacked
attribute on each of the structure or union members.
The
packed
attribute specifies that a variable or structure field should have the smallest possible alignment—one byte for a variable, and one bit for a field, unless you specify a larger value with thealigned
attribute.
In other words, if you still want a packed structure to still have some minimum alignment after you've forced the alignment of all members, and thus the type itself, to nothing, you need to specify so - the fact that that might not actually make -Wpacked
shut up is a different matter - GCC may well just spit that out reflexively before it actually considers any further alignment modifiers.
Note that in terms of serialisation, you don't necessarily need to pack it anyway. The members fit in 9 words exactly, so the only compiler padding anywhere is an extra word at the end to round the total size up to 40 bytes, since acq_time
forces the struct to a natural alignment of 8. Unless you want to operate on a whole array of these things at once, you can get away with simply ignoring that and still treating the members as one 36-byte chunk.
Upvotes: 4
Reputation: 5874
Ok, at this point, I'm somewhat confident that the warning is being emitted in error.
I have a statically defined instance of the struct, and at one point I zero it:
adc_sample_set running_average;
int accumulated_samples;
inline void zero_average_buf(void)
{
accumulated_samples = 0;
running_average.adc_samples[0] = 0;
running_average.adc_samples[1] = 0;
running_average.adc_samples[2] = 0;
running_average.adc_samples[3] = 0;
running_average.adc_samples[4] = 0;
running_average.adc_samples[5] = 0;
running_average.overrun = 0;
running_average.acq_time = 0;
}
The disassembly for the function is the follows:
{
004005F8 push {r3, lr}
accumulated_samples = 0;
004005FA movs r2, #0
004005FC ldr r3, [pc, #36]
004005FE str r2, [r3]
running_average.adc_samples[0] = 0;
00400600 ldr r3, [pc, #36]
00400602 str r2, [r3]
running_average.adc_samples[1] = 0;
00400604 str r2, [r3, #4]
running_average.adc_samples[2] = 0;
00400606 str r2, [r3, #8]
running_average.adc_samples[3] = 0;
00400608 str r2, [r3, #12]
running_average.adc_samples[4] = 0;
0040060A str r2, [r3, #16]
running_average.adc_samples[5] = 0;
0040060C str r2, [r3, #20]
running_average.overrun = 0;
0040060E strb.w r2, [r3, #32]
running_average.acq_time = 0;
00400612 movs r0, #0
00400614 movs r1, #0
00400616 strd r0, r1, [r3, #24]
Note that r3
in the above is 0x2001ef70
, which is indeed 4-byte aligned. r2
is the literal value 0
.
The str
opcode requires 4-byte alignment. The strd
opcode only requires 4 byte alignment as well, since it appears to really be two sequential 4-byte operations, though I don't know how it actually works internally.
If I intentionally mis-align my struct, to force the slow-path copy operation:
typedef struct adc_samples_t
{
int8_t overrun;
int32_t adc_samples[6];
uint64_t acq_time;
uint8_t padding_1;
uint8_t padding_2;
uint8_t padding_3;
} __attribute__((packed, aligned(8))) adc_sample_set;
I get the following assembly:
{
00400658 push {r3, lr}
accumulated_samples = 0;
0040065A movs r3, #0
0040065C ldr r2, [pc, #84]
0040065E str r3, [r2]
running_average.adc_samples[0] = 0;
00400660 ldr r2, [pc, #84]
00400662 strb r3, [r2, #1]
00400664 strb r3, [r2, #2]
00400666 strb r3, [r2, #3]
00400668 strb r3, [r2, #4]
running_average.adc_samples[1] = 0;
0040066A strb r3, [r2, #5]
0040066C strb r3, [r2, #6]
0040066E strb r3, [r2, #7]
00400670 strb r3, [r2, #8]
running_average.adc_samples[2] = 0;
00400672 strb r3, [r2, #9]
00400674 strb r3, [r2, #10]
00400676 strb r3, [r2, #11]
00400678 strb r3, [r2, #12]
running_average.adc_samples[3] = 0;
0040067A strb r3, [r2, #13]
0040067C strb r3, [r2, #14]
0040067E strb r3, [r2, #15]
00400680 strb r3, [r2, #16]
running_average.adc_samples[4] = 0;
00400682 strb r3, [r2, #17]
00400684 strb r3, [r2, #18]
00400686 strb r3, [r2, #19]
00400688 strb r3, [r2, #20]
running_average.adc_samples[5] = 0;
0040068A strb r3, [r2, #21]
0040068C strb r3, [r2, #22]
0040068E strb r3, [r2, #23]
00400690 strb r3, [r2, #24]
running_average.overrun = 0;
00400692 mov r1, r2
00400694 strb r3, [r1], #25
running_average.acq_time = 0;
00400698 strb r3, [r2, #25]
0040069A strb r3, [r1, #1]
0040069C strb r3, [r1, #2]
0040069E strb r3, [r1, #3]
004006A0 strb r3, [r1, #4]
004006A2 strb r3, [r1, #5]
004006A4 strb r3, [r1, #6]
004006A6 strb r3, [r1, #7]
So, pretty clearly, I'm getting the proper aligned-copy behaviour with my original struct definition, despite the compiler apparently incorrectly warning that it will result in inefficient accesses.
Upvotes: 3