CatsLoveJazz
CatsLoveJazz

Reputation: 869

Explanation of packed attribute in C

I was wondering if anyone could offer a more full explanation to the meaning of the packed attribute used in the bitmap example in pset4.

"Our use, incidentally, of the attribute called packed ensures that clang does not try to "word-align" members (whereby the address of each member’s first byte is a multiple of 4), lest we end up with "gaps" in our structs that don’t actually exist on disk."

I do not understand the comment around gaps in our structs. Does this refer to gaps in the memory location between each struct (i.e. one byte between each 3 byte RGB if it was to word-algin)? Why does this matter in for optimization?

typedef uint8_t BYTE;
typedef struct
{
    BYTE  rgbtBlue;
    BYTE  rgbtGreen;
    BYTE  rgbtRed;
} __attribute__((__packed__))
RGBTRIPLE;

Upvotes: 3

Views: 3091

Answers (3)

Steve
Steve

Reputation: 181

Sometimes, the elements of a struct are simply aligned to a 4-byte boundary (or whatever the size of a register is in the CPU) to optimize read/write access to RAM. Often, smaller elements are packed together, but alignment is dictated by a larger type in the struct.

In your case, you probably don't need to pack the struct, but it doesn't hurt.

With some compilers, each byte in your struct could end up taking 4 bytes of RAM each (so, 12 bytes for the entire struct). Packing the struct removes the alignment requirement for each of the BYTEs, and ensures that the entire struct is placed into one 4-byte DWORD (unless the alignment for the entire program is set to one byte, or the struct is in an array of said structs, in that case it would literally be stored in 3 contiguous bytes of RAM).

See comments below for further discussion...

Upvotes: 2

Jonathan Leffler
Jonathan Leffler

Reputation: 754170

Beware: prejudices on display!

As noted in comments, when the compiler adds the padding to a structure, it does so to improve performance. It uses the alignments for the structure elements that will give the best performance.

Not so very long ago, the DEC Alpha chips would handle a 'unaligned memory request' (umr) by doing a page fault, jumping into the kernel, fiddling with the bytes to get the required result, and returning the correct result. This was painfully slow by comparison with a correctly aligned memory request; you avoided such behaviour at all costs.

Other RISC chips (used to) give you a SIGBUS error if you do misaligned memory accesses. Even Intel chips have to do some fancy footwork to deal with misaligned memory accesses.

The purpose of removing padding is to (decrease performance but) benefit by being able to serialize and unserialize the data without doing the job 'properly' — it is a form of laziness that actually doesn't work properly when the machines communicating are not of the same type, so proper serialization should have been done in the first place.

What I mean is that if you are writing data over the network, it seems simpler to be able to send the data by writing the contents of a structure as a block of memory (error checking etc omitted):

write(fd, &structure, sizeof(structure));

The receiving end can read the data:

read(fd, &structure, sizeof(structure));

However, if the machines are of different types (for example, one has an Intel CPU and the other a SPARC or Power CPU), the interpretation of the data in those structures will vary between the two machines (unless every element of the array is either a char or an array of char). To relay the information reliably, you have to agree on a byte order (e.g. network byte order — this is very much a factor in TCP/IP networking, for example), and the data should be transmitted in the agreed upon order so that both ends can understand what the other is saying.

You can define other mechanisms: you could use a 'sender makes right' mechanism, in which the 'receiver' let's the sender know how it wants the data presented and the sender is responsible for fixing up the transmitted data. You can also use a 'receiver makes right' mechanism which works the other way around. Both these have been used commercially — see DRDA for one such protocol.


Given that the type of BYTE is uint8_t, there won't be any padding in the structure in any sane (commercially viable) compiler. IMO, the precaution is a fantasy or phobia without a basis in reality. I'd certainly need a carefully documented counter-example to believe that there's an actual problem that the attribute helps with.

I was led to believe that you could encounter issues when you pass the entire struct to a function like fread as it assumes you're giving it an array like chunk of memory, with no gaps in it. If your struct has gaps, the first byte ends up in the right place, but the next two bytes get written in the gap, which you don't have a proper way to access.

Sorta...but mostly no. The issue is that the values in the padding bytes are indeterminate. However, in the structure shown, there will be no padding in any compiler I've come across; the structure will be 3 bytes long. There is no reason to put any padding anywhere inside the structure (between elements) or after the last element (and the standard prohibits padding before the first element). So, in this context, there is no issue.

If you write binary data to a file and it has holes in it, then you get arbitrary byte values written where the holes are. If you read back on the same (type of) machine, there won't actually be a problem. If you read back on a different (type of) machine, there may be problems — hence my comments about serialization and deserialization. I've only been programming in C a little over 30 years; I've never needed packed, and don't expect to. (And yes, I've dealt with serialization and deserialization using a standard layout — the system I mainly worked on used big-endian data transfer, which corresponds to network byte order.)

Upvotes: 5

Pedro David
Pedro David

Reputation: 377

The objective is exactly what you said, not having gaps between each struct. Why is this important? Mostly because of cache. Memory access is slow!!! Cache is really fast. If you can fit more in cache you avoid cache misses (memory accesses).

Edit: Seems I was wrong, didn't seem really useful if the objective was structure padding since the struct has 3 BYTE

Upvotes: -1

Related Questions