Reputation: 2115

Padding in structures in C

This is an interview question. Till now, I used to think such questions were purely compiler dependent and shouldn't worry me, but now, I am rather curious about it.

Suppose you are given two structures as:

struct A {  
  int* a;  
  char b;  
 }

and ,

struct B {  
  char a;  
  int* b;  
}

So which one would you prefer and why? My answer went like this (though I was somewhat shooting in the dark) that the first structure should be preferred since the compiler allocates space for a structure in some multiples of the word size (which is the size of the pointer - 4 bytes on 32 bit machines and 8 bytes on 64 bit ones). So, for both the structures the compiler would allocate 8 bytes(assuming its a 32 bit machine). But, in the first case, the padding would be done after all my variables(i.e. after a and b). So even if by some chance, b gets some value that overflows and destroys my next padded bytes, but my a is still safe.

He didn't seemed much pleased and asked for one disadvantage of the first structure over the second. I didn't have much to say. :D

Please help me with the answers.

Upvotes: 43

Answers (5)

MByD

Reputation: 137442

I don't think there's an advantage for any of this structures. There is one(!) constant in this equation. The order of the members of the struct is guaranteed to be as declared.

So in case like the following, the second structure might have an advantage, since it probably has a smaller size, but not in your example, as they will probably have the same size:

struct {
    char a;
    int b;
    char c;
} X;

Vs.

struct {
    char a;
    char b;
    int c;
} Y;

A little more explanation regarding comments below:

All the below is not a 100%, but the common way the structs will be constructed in 32 bits system where int is 32 bits:

Struct X:

|     |     |     |     |     |     |     |     |     |     |     |     |
 char  pad    pad   pad   ---------int---------- char   pad   pad   pad   = 12 bytes

struct Y:

|     |     |     |     |     |     |     |     |
 char  char  pad   pad   ---------int----------        = 8 bytes

Upvotes: 37

Steve Jessop

Reputation: 279435

I can't think of a disadvantage of the first structure over the second in this particular case, but it's possible to come up with examples where there are disadvantages to the general rule of putting the largest members first:

struct A {  
    int* a;
    short b;
    A(short num) : b(2*num+1), a(new int[b]) {} 
    // OOPS, `b` is used uninitialized, and a good compiler will warn. 
    // The only way to get `b` initialized before `a` is to declare 
    // it first in the class, or of course we could repeat `2*num+1`.
}

I also heard about quite a complicated case for large structs, where the CPU has fast addressing modes for accessing pointer+offset, for small values of offset (up to 8 bits, for example, or some other limit of an immediate value). You best micro-optimize a large structure by putting as many of the most commonly-used fields as possible within range of the fastest instructions.

The CPU might even have fast addressing for pointer+offset and pointer+4*offset. Then suppose you had 64 char fields and 64 int fields: if you put the char fields first then all fields of both types can be addressed using the best instructions, whereas if you put the int fields first then the char fields that aren't 4-aligned will just have to be accessed differently, perhaps by loading a constant into a register rather than with an immediate value, because they're outside the 256-byte limit.

Never had to do it myself, and for instance x86 allows big immediate values anyway. It's not the sort of optimization that anyone would normally think about unless they spend a lot of time staring at assembly.

Upvotes: 4

alecov

Reputation: 5171

Briefly, there's no advantage in choosing either in the general case. The only situation where the choice would matter in practice is if structure packing is enabled, in the case struct A would be a better choice (since both fields would be aligned in memory, while in struct B the b field would be located at an odd offset). Structure packing means that no padding bytes are inserted inside the structure.

However, this is a rather uncommon scenario: structure packing is generally only enabled in specific situations. It is not a concern on most programs. And it is also not controllable through any portable construction in the C standard.

Upvotes: 2

cnicutar

Reputation: 182764

Some machines access data more efficiently when the values aligned to some boundary. Some require data to be aligned.

On modern 32-bit machines like the SPARC or the Intel [34]86, or any Motorola chip from the 68020 up, each data iten must usually be ``self-aligned'', beginning on an address that is a multiple of its type size. Thus, 32-bit types must begin on a 32-bit boundary, 16-bit types on a 16-bit boundary, 8-bit types may begin anywhere, struct/array/union types have the alignment of their most restrictive member.

So you could have

struct B {  
    char a;
    /* 3 bytes of padding ? More ? */
    int* b;
}

A simple rule that minimize padding in the ``self-aligned'' case (and does no harm in most others) is to order your struct members by decreasing size.

Personally I see not disadvantage with the first struct when compared to the second.

Upvotes: 11

Kevin

Reputation: 25269

This is also something of a guess, but most compilers have a misalign option that will explicitly not add padding bytes. This then requires (on some platforms) a runtime fixup (hardware trap) to align accesses on the fly (with corresponding performance penalty). If I remember right HPUX fell into this category. So the first struct the fields are still aligned even when misalign compiler options are used (because as you said the padding would be at the end).

Upvotes: 1

Padding in structures in C

Answers (5)

Related Questions