user2138149
user2138149

Reputation: 17276

What is pointer alignment in C++ and why is it significant in the context of creating undefined behaviour?

I received comments on a recent question which suggests that pointers which are not aligned can created undefined behaviour when dereferenced (at least in newer C++ standards).

My question consists of two parts.

BTW - since my base level understanding of this subject matter is clearly quite limited, it may be the case that I misunderstood the comments which were made, or did not understand them completely. So it is possible this question doesn't make much sense.

Upvotes: 1

Views: 131

Answers (1)

Sebastian Redl
Sebastian Redl

Reputation: 72063

What is alignment?

Typical memory doesn't deliver data a byte at a time, and typical CPUs don't want it that way. The data bus between them is 2, 4, or even 8 bytes wide. (Sometimes even more, but we'll go with this for now.) Moreover, you can't just read 8 bytes from any address, but only from an address that is divisible by 8. Such an address is called aligned.

So if you want to read 8 bytes from an address that isn't divisible by 8 (unaligned), what can you do? Well, you can read some bytes from the next lower aligned address, and some from the next upper, and then combine those you want together and throw the rest away. That's an unaligned load.

But while you the programmer can do that explicitly, whether the CPU will do it if you give it an unaligned address to load is a different question. Some CPU architectures will (e.g. x86), but usually at the cost of performance. Some won't, and instead will fault. Sometimes it depends on the instruction, e.g. SSE has both an aligned load instruction that will fault on unaligned addresses and an unaligned load instruction that won't.

In programming language terms, where we abstract away from the hardware, types have alignments, e.g. a 4-byte int typically has 4-byte alignment, and so if your int object doesn't sit at an aligned address, it's unaligned, and reading it is undefined behavior.

So why undefined behavior?

Because that's generally how C dealt with differences between platforms. If doing something yields different results on different platforms, then very often the language just says it is undefined. Compare signed integer overflow (has different behavior on some old platforms) or shifting beyond the integer width (has different behavior on some rather recent platforms).

Upvotes: 4

Related Questions