Reputation: 1971

Custom allocator and memory alignment

I'm trying to implement custom allocator to work with std containers based on the requirements here: https://en.cppreference.com/w/cpp/named_req/Allocator

I'm currently trying to implement a linear allocator and I'm having hard time with memory alignment.
After I allocate a block of memory I'm wondering how much padding do I need between each object in the block to optimize cpu read/writes. I'm not sure if the address alignment should be divisible

by the cpu word size (4 bytes on 32 bits machine and 8 bytes on 64 bits machine)
by the sizeof(T)
by the alignof(T)

I read different answers different places.
For example in this question the accepted answers says:

The usual rule of thumb (straight from Intels and AMD's optimization manuals) is that every data type should be aligned by its own size. An int32 should be aligned on a 32-bit boundary, an int64 on a 64-bit boundary, and so on. A char will fit just fine anywhere.

So by that answer it looks like the address alignment should be divisible by sizeof(T).

On this question the second answer state that:

The CPU always reads at its word size (4 bytes on a 32-bit processor), so when you do an unaligned address access — on a processor that supports it — the processor is going to read multiple words.

So by that answer it looks like the address alignment should be divisble by the cpu word size.

So I'm seeing some conflicted statements on how to optimize data alignment for cpu read/write and I'm not sure if I'm not understanding something correctly or there're some wrong answers? Maybe someone could clear this out for me on what the address alignment should be divisible by.

Upvotes: 3

Answers (2)

danielschemmel

Reputation: 11116

As a general rule-of-thumb (that is, do this unless you have a good reason to do otherwise), you want to align elements of a given C++ type to their alignment, i.e., alignof(T). If the type wants to be aligned to a 32-bit boundary (as int in most common c++ implementation is implemented), it will exhibit a fitting (4-byte) alignment.

Of course, between the base addresses of two different objects of type T there must be at least sizeof(T) bytes of room, which will usually be an integer-valued multiple of its alignment (it is actually quite hard to pass an over-aligned type to a template function, as it will strip any outer alignas attribute).

In most use-cases, you will therefore be fine by doing the following: Find the first base-address in your underlying storage that is aligned to alignof(T) and then go forward from there in steps of sizeof(T).

This way, you will rely on the users of your allocator to tell you what they want. This is exactly what you want, as the optimizer may rely on knowledge about alignment and, e.g., emit SSE aligned loads for arrays of double-precision floats, which will cause your program to crash if they are aligned wrongly.

Going down the rabbit hole

This gives rise to the following possible situations:

Easy type, has word length and word alignment (e.g., int with sizeof(int) = 4 and alignof(int) = 4):

sizeof(T) = 4 and alignof(T) = 4
 0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F 
[aaaaaaaaaa][bbbbbbbbbb][cccccccccc][dddddddddd]

Types which have a size that is a multiple of its alignment (e.g., using T = int[2])

sizeof(T) = 8 and alignof(T) = 4
 0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F 
[aaaaaaaaaaaaaaaaaaaaaa][bbbbbbbbbbbbbbbbbbbbbb]

Overaligned types, which have a larger alignment than size (e.g., using T = alignas(8) char[3]). Here be dragons!

sizeof(T) = 3 and alignof(T) = 8
 0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F 
[aaaaaaa]               [bbbbbbb]

Note that there is unused space in the overaligned example. This is necessary, as objects that are aligned to an 8-byte boundary may not be put anywhere else, leading to potential wastage. The most common use for types such as this is CPU-specific optmizations, such as to prevent false sharing.

Finally, there is the slightly weird case of objects whose size is larger than, but not an integer multiple of their alignment (e.g., using T = alignas(4) char[5];). This is basically just a small extension to the previous example of overaligned types:

sizeof(T) = 5 and alignof(T) = 4
 0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F 
[aaaaaaaaaaaaa]         [bbbbbbbbbbbbb]

While the alignment would make it possible to place the second object at base address 4, there is already an object there.

Putting all these examples together, the number of bytes that needs to be between the base addresses of two objects of type T is:

inline auto object_distance = sizeof(T) % alignof(T) == 0 ? sizeof(T) : sizeof(T) + (alignof(T) - sizeof(T) % alignof(T));

Upvotes: 3

Nicol Bolas

Reputation: 473447

After I allocate a block of memory I'm wondering how much padding do I need between each object in the block to optimize cpu read/writes.

Precisely zero padding between objects; you're not allowed to add padding. In the C++ standard library allocator model, your allocator<T>::allocate(count) method is required to allocate sufficient space to store an array of count objects of type T. Arrays in C++ are tightly packed; the offset from one T in the array to another T is required to be sizeof(T).

So you can't insert padding between objects in the allocated storage. You can insert padding at the beginning of the block of memory you allocate, so that you can be accurate with alignof(T) (which your allocator<T>::allocate is also required to respect). But the returned pointer has to be a pointer to the aligned storage for the Ts. So if you have padding in the front of the allocation, you'll need some way to undo the padding when deallocate is called, since it only gets the aligned storage address.

When it comes to alignment of structs that contain fundamental types, you're relying on the compiler to impose its alignment requirements on those structs. So for this definition:

struct U
{
  std::int32_t i;
  std::int64_t j;
};

If the compiler deems that it would be more optimal for int64_t's to be on 8-byte alignments, then the compiler will insert appropriate padding between i and j in U. sizeof(U) will be 16, and alignof(U) will be 8.

Creating that alignment is not your job, and you're not allowed to do it for the compiler. You must simply respect the alignment of any type you're given in your allocator<T>::allocate calls.

Upvotes: 1

Custom allocator and memory alignment

Answers (2)

Going down the rabbit hole

Related Questions