Reputation: 9484

Why does padding have to be a power of two?

I am doing some sample programs to explore C and would like to know why structure padding can be done in power of two only.

#include <stdio.h>

#pragma pack(push, 3)

union aaaa
{

   struct bbb
   {
      int a;
      double b;
      char c;
   }xx;

   float f;
};

#pragma pack(pop)

int main()
{

printf("\n Size: %d", sizeof(union aaaa));

return 0;
}

While compiling

warning: alignment must be a small power of two, not 3 [-Wpragmas]
warning: #pragma pack (pop) encountered without matching #pragma pack (push) [-Wpragmas]

It seems #pragma has no effect. Output is 24 only. i.e 4 byte aligned.

Upvotes: 6

Answers (5)

Eric Postpischil

Reputation: 223231

The short answer is that basic objects in processors have sizes that are small powers of two (e.g., 1, 2, 4, 8, and 16 bytes) and memory is organized in groups whose size is a small power of two (e.g., 8 bytes), so structures must be aligned to work well with those sizes.

The long answer is that the reasons for this are grounded in physics and elementary mathematics. Computers naturally work with bits, with values 0 and 1. This is because it is easy to design physical things that switch between two values: High voltage and low voltage, presence of a charge or absence of a charge, et cetera. Distinguishing between three values is harder, because you have to be more sensitive about transitions between values. So, as computer technology has developed over the decades, we have used bits (binary digits) instead of alternatives like trinary digits.

To make larger numbers, we combine multiple bits. So two bits can, combined, have four values. Three bits can have eight values, and so on. In older computers, sometimes bits were grouped six or ten at a time. However, eight became common, and is essentially standard now. Using eight bits for a byte does not have as strong a physical reason as some of the other groupings I describe, but it is the way of the world.

Another feature of computers is memory. Once we have these bytes, we want to store a lot of them, in a device that is easily accessible to the processor, so we can get lots of bytes into and out of the processor quickly. When we have a lot of bytes, we need a way for the processor to tell memory which bytes the processor wants to read or write. So the processor needs a way to address the bytes.

The processor uses bits for values, so it is going to use bits for address values. So the memory will be built to accept bits to indicate which bytes to supply to the processor when the processor reads or which bytes to store when the processor writes. What does the memory device do with those bits? An easy thing is to use one bit to control one switch of the pathways to memory. The memory will be made out of many small parts that store bytes.

Consider a thing in the memory device that can store a byte, and consider two of those things next to each other, say A and B. We can use a switch to select whether we want the A byte to be active or the B byte to be active. Now consider four of those things, say A, B, C, and D. We can use one switch to select whether to use the A-B group or to use the C-D group. Then another switch selects A or B (if using the A-B group) or C or D (if using the C-D) group.

This process continues: Each bit in a memory address selects a group of storage units to use. 1 bit selects between 2 storage units, 2 select between 4, 3 select between 8, 4 select between 16, and so on. 8 bits select between 256 storage units, 24 bits select between 16,777,216 storage units, and 32 bits select between 4,294,967,296 storage units.

There is one more complication. Moving individual bytes between the processor and memory is slow. Instead, modern computers organize memory into larger pieces, such as eight bytes. You can only move eight bytes at a time between memory and the processor. When the processor requests that memory supply some data, the processor only sends the high bits of the address—the low three bits select individual bytes within eight bytes, and they are not sent to memory.

This is faster because the processor gets eight bytes in the time it would otherwise take to have the memory do all its switching to supply one byte, and it is cheaper because you do not need the huge number of extra switches it would take to distinguish individual bytes in memory.

However, now it means the processor has no way of getting an individual byte from memory. When you execute an instruction that accesses an individual byte, the processor must read eight bytes from memory and then shift those bytes around inside the processor to get the one byte you want. Similarly, to get two or four bytes, the processor reads eight bytes and extracts just the bytes you want.

To simplify this process, processor designers specify that data should be aligned in certain ways. Typically, they require two-byte data (like 16-bit integers) to be aligned to multiples of two bytes, four-byte data (like 32-bit integers and 32-bit floating-point values) to be aligned to multiples of four bytes, and eight-byte data to be aligned to multiples of eight bytes.

This required alignment has two effects. First, because four-byte data can only start at two places in an eight-byte chunk read from memory (the beginning or the middle), the processor designers only need to put in wires to extract the four bytes from two places. They do not need to add all the extra wires to extract four bytes from any of the eight individual bytes that could be starting places if any alignment were allowed. (Some processors will completely prohibit loading unaligned data, and some processors will allow it but use slow methods to extract it that use fewer wires but use an iterative algorithm to shift the data over several processor cycles, so unaligned loads are slow.)

The second effect is that, because four-byte data can only start at two places in an eight-byte chunk, it also ends inside that chunk. Consider what would happen if you tried to load four bytes of data that started at the sixth byte of an eight-byte chunk. The first two bytes are in the chunk, but the next two bytes are in the next chunk in memory. The processor would have to read two chunks from memory, take different bytes from each of them, and put those bytes together. That is much slower than just reading one chunk.

So, memory is organized by powers of two because that is a natural result of bits, and processors require alignment because that makes memory access more efficient. The alignment is naturally powers of two, and that is why your structure sizes work better when they are multiples of the powers of two that are used for alignment.

Upvotes: 22

Edwin Buck

Reputation: 70929

The memory bus is only so many bytes wide, and it is typically a power of two bytes wide because that's the maximum effective use of bits in a field

A three bit field like so

[0][0][0]

has a possible representation of eight numbers

0, 1, 2, 3, 4, 5, 6, 7, and 8

If you limited yourself to the numbers

0, 1, 2

Then you would be wasting the highest order bit, which would always be zero. Hardware and software designers in the early days of computing needed every bit they could grab because memory was very expensive, so this kind of waste was designed out of the system.

Later, when memory subsystems grew, aligned access became cheaper to design. Aligned access requires that the start of a data element is on certain boundaries to reduce the number of transfers across the memory bus, and to reduce the number of computations in bus management.

The "power of two" requirement was a side effect of bus architecture coupled with a simple routine that assured C data structures could be aligned with aligned access boundaries.

Upvotes: 3

supercat

Reputation: 81197

While there are physical reasons that computer designs favor memory power-of-two memory alignments, another equally-important requirement is that all memory alignments must divide evenly into some number. Otherwise, if e.g. one struct needed to be aligned on a multiple of 7, one needed to be aligned on a multiple of 8, and one needed to be aligned on a multiple of 9, a union containing all three structs would have to be aligned to a multiple of 504.

The normal way that the divisibility requirement is handled is to say that all alignment sizes must subdivide some large power of two. Such an approach would work, but it would not be the only feasible implementation. One could, if one were so inclined, design hardware which would work with 120-byte cache lines, and allow for objects to be aligned on multiples of 2, 3, 4, 5, 6, 8, 10, 12, 15, 20, 24, 30, 40, 60, or 120 bytes. Such a design could actually be very reasonable from a hardware standpoint (each cache line would be stored as 1024 bits, including 64 bits of error-correcting, write-tagging, or other information) and would permit the efficient storage of 80-bit reals, RGB or XYZ triplets (of any numeric type including 80-bit reals). If one didn't require that physical addresses be in the same numerical sequence as logical addresses, the circuitry required to that map 120-byte ranges to cache lines would not be overly expensive. On the other hand, outside of specialized applications, it's unlikely that the benefits of non-power-of-two mappings would be sufficient to overcome issues of cost and market inertia.

Upvotes: 2

user1554345

Reputation: 1

According to GCC documentation (http://gcc.gnu.org/onlinedocs/gcc/Structure_002dPacking-Pragmas.html) the pack pragma is used "For compatibility with Microsoft Windows compilers".

If you search for MS documentation on alignment (http://msdn.microsoft.com/en-us/library/ms253949(v=vs.80).aspx), you find that MSVC compiler ensures alignment of types based on the data size; which, as explained by other posts, is logically always a power of 2.

If your problem is that you need some fancy alignment of a data structure (to access some ill-concieved memory mapped peripheral), then your best shot is to use manual packing of the structure by adding fields that add the required padding.

Upvotes: 0

Matteo Italia

Reputation: 126867

Because otherwise it wouldn't make sense. You add padding to structures because CPUs works faster on aligned data (and, on some architectures, they don't work at all on unaligned data), and alignment requisites of the various data types are always on small powers of two (at least, on any architecture I've heard of).

Still, if for some strange reason you do require arbitrary alignment, nothing prevents you from adding dummy char arrays in the right spots to enforce your alignment (which is more or less what the compiler does under the hood anyway).

Upvotes: 9

Why does padding have to be a power of two?

Answers (5)

Related Questions