i love stackoverflow
i love stackoverflow

Reputation: 1685

Assemblers and word alignment

Today I learned that if you declare a char variable (which is 1 byte), the assembler actually uses 4 bytes in memory so that the boundaries lie on multiples of the word size.

If a char variable uses 4 bytes anyway, what is the point of declaring it as a char? Why not declare it as an int? Don't they use the same amount of memory?

Upvotes: 2

Views: 720

Answers (4)

Eric Postpischil
Eric Postpischil

Reputation: 223231

When you are writing in assembly language and declare space for a character, the assembler allocates space for one character and no more. (I write in regard to common assemblers.) If you want to align objects in assembly language, you must include assembler directives for that purpose.

When you write in C, and the compiler translates it to assembly and/or machine code, space for a character may be padded. Typically this is not done because of alignment benefits for character objects but because you have several things declared in your program. For example, consider what happens when you declare:

char a;
char b;
int i;
char c;
double d;

A naïve compiler might do this:

  • Allocate one byte for a at the beginning of the relevant memory, which happens to be aligned to a multiple of, say, 16 bytes.
  • Allocate the next byte for b.
  • Then it wants to place the int i which needs four bytes. On this machine, int objects must be aligned to multiples of four bytes, or a program that attempts to access them will crash. So the compiler skips two bytes and then sets aside four bytes for i.
  • Allocate the next byte for c.
  • Skip seven bytes and then set aside eight bytes for d. This makes d aligned to a multiple of eight bytes, which is beneficial on this hypothetical machine.

So, even with a naïve compiler, a character object does not require four whole bytes to itself. It can share with neighbor character objects, or other objects that do not require greater alignment. But there will be some wasted space.

A smarter compiler will do this:

  • Sort the objects it has to allocate space for according to their alignment requirements.
  • Place the most restrictive object first: Set aside eight bytes for d.
  • Place the next most restrictive object: Set aside four bytes for i. Note that i is aligned to a multiple of four bytes because it follows d, which is an eight-byte object aligned to a multiple of eight bytes.
  • Place the least restrictive objects: Set aside one byte each for a, b, and c.

This sort of reordering avoids wasting space, and any decent compiler will use it for memory that it is free to arrange (such as automatic objects on stack or static objects in global memory).

When you declare members inside a struct, the compiler is required to use the order in which you declare the members, so it cannot perform this reordering to save space. In that case, declaring a mixture of character objects and other objects can waste space.

Upvotes: 5

old_timer
old_timer

Reputation: 71556

Others have for the most part answered this. Assuming a char is a single byte, does declaring a char mean that it always pads to an alignment? Nope, some compilers do by default some dont, and many you can change the default using some sort of command somewhere. Does this mean you shouldnt use a char? It depends, first off the padding doesnt always happen so the few wasted bytes dont always happen. You are programming in a high level language using a compiler so if you think that you have only 3 wasted bytes in your whole binary...think again. Depending on the architecture using chars can have some savings, for example loading immediates saves you three bytes or more on some architectures. Other architectures just to do simple operations with the register extra instructions are required to sign extend or clip the larger register to behave like a byte sized register. If you are on a 32 bit computer and you are using an 8 bit character because you are only counting from 1 to 100, you might want to use a full sized int, in the long run you are probably not saving anyone anything by using the char. Now if this is an 8086 based pc running dos, that is a different story. Or an 8 bit microcontroller, then you want to lean toward the 8 bit variables as much as possible.

Upvotes: 0

paulsm4
paulsm4

Reputation: 121759

Q: Does a program allocate four bytes for every "char" you declare?

A: No - absolutely not ;)

Q: Is it possible that, if you allocate a single byte, the program might "pad" with extra bytes?

A: Yes - absolutely yes.

The issue is "alignment". Some computer architectures must access a data value with respect to a particular offset: 16 bits, 32 bits, etc. Other architectures perform better if they always access a byte with respect to an offset. Hence "padding":

Upvotes: 3

Kerrek SB
Kerrek SB

Reputation: 477318

There may indeed not be any point in declaring a single char variable.

There may however be many good reasons to want a char-array, where an int-array really wouldn't do the trick!

(Try padding a data structure with ints...)

Upvotes: 0

Related Questions