Reputation: 2570
People say it's not good to trust reinterpret_cast
to convert from raw data (like char*
) to a structure. For example, for the structure
struct A
{
unsigned int a;
unsigned int b;
unsigned char c;
unsigned int d;
};
sizeof(A) = 16
and __alignof(A) = 4
, exactly as expected.
Suppose I do this:
char *data = new char[sizeof(A) + 1];
A *ptr = reinterpret_cast<A*>(data + 1); // +1 is to ensure it doesn't points to 4-byte aligned data
Then copy some data to ptr
:
memcpy_s(sh, sizeof(A),
"\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x04\x00\x00\x00", sizeof(A));
Then ptr->a
is 1, ptr->b
is 2, ptr->c
is 3 and ptr->d
is 4.
Okay, seems to work. Exactly what I was expecting.
But the data pointed by ptr
is not 4-byte aligned like A
should be. What problems this may cause in a x86 or x64 platform? Performance issues?
Upvotes: 4
Views: 1301
Reputation: 12057
The problem is that you can not be sure if this code will run on another platform, with the next version of Visual Studio, etc. When running on another processor, it may cause a hardware exception.
There was a time when you could read out arbitrary memory locations, but all those programs crash with an "access violation" exception nowadays. Something similar could happen to this program in the future.
However, what you can do, and what any compiler that calls itself "C++ standard compliant" must compile correctly, is this:
You can reinterpret_cast
a pointer to something else, and then back to the original type. The value of the type, when read before and after, must stay the same.
I don't know what exactly you want to do, but you might get away with, for example
struct A
reinterpret_cast
ing it to char
sand restore everything later:
struct A
reinterpret_cast
it to char
sreinterpret_cast
it back to a struct A
Upvotes: 0
Reputation: 62064
For one thing, your initialization string assumes that the underlying integers are stored in little endian format. But another architecture might use big endian, in which case your string will produce garbage. (Some huge numbers.) The correct string for that architecture would be
"\x00\x00\x00\x01\x00\x00\x00\x02\x03\x00\x00\x00\x00\x00\x00\x04".
Then, of course, there is the issue of alignment.
Certain architectures won't even allow you to assign the address of data + 1 to a non-character pointer, they will issue a memory alignment trap.
But even architectures which will allow this (like x86) will perform miserably, having to perform two memory accesses for each integer in the structure. (For more information, see this excellent answer: https://stackoverflow.com/a/381368/773113)
Finally, I am not completely sure about this, but I think that C and C++ do not even guarantee to you that an array of characters will contain characters packed in bytes. (I hope someone who knows more might clarify this.) Conceivably, there can be architectures which are completely incapable of addressing non-word-aligned data, so in such architectures each character would have to occupy an entire word. This would mean that it would be valid to take the address of data + 1
, because it would still be aligned, but your initialization string would be unsuitable for the intended job, as the first 4 characters in it would cover your entire structure, producing a=1, b=0, c=0 and d=0.
Upvotes: 3