Reputation: 1594
I want to write a fast virtual machine, and the byte code is stored as a block of memory to which I have char*
pointer. Based on the instruction that is read, I want to reinterpret different parts of this memory block as different types. Do I have to use memcpy
? Can't I use reinterpret_cast
?
Let's say:
double r0, r1; // some registers
const char* ptr // pointer to some raw memory block
Under what scenarios can I do:
r0 = r1 * (double*)(ptr + offset);
Or do I always have to do:
double tmp;
memcpy(ptr + offset, &tmp, sizeof(double));
r0 = r1 * tmp;
This is the question. If it is UB to do anything but memcpy
I would want know how this is dealt with in real-life cases, e.g. writing VM, interpreters, emulators, etc.?
In case the answer here is that I must use memcpy
in all scenarios, and I can not do (double*)(ptr)
, here is my counter-point reasoning:
I understand, that memcpy
is guaranteed to be correct, and it is a recommended way of doing this. What if I want the VM to still be fast without optimizations turned on, e.g. in debug build?
What if I can guarantee the memory alignment of these datatypes in the memory block? I.e. all floats are guaranteed to have 4-byte alignment, and all doubles are guaranteed to have 8-byte alignment? What if I can guarantee that the memory block is never modified during the execution of the VM?
If this all violates the letter of the law, is illegal, UB, heresy, and the only god-intended way is by using memcpy
, then the follow-up question: how does it work for malloc
and friends?
I can do:
double* x = (double*)malloc(sizeof(double));
*x = 5.0; // write
r0 = r1 * *x; // read from it and do some arithmetic
And no one seems to have any problems with it. If the above is legal, the code below is legal too, right? Or wrong?
char* ptr = (char*)malloc(sizeof(double));
double* x = (double*)ptr;
*x = 5.0; // write
r0 = r1 * *x; // read from it and do some arithmetic
What about this?
char* ptr = (char*)malloc(sizeof(double));
{
double* x = (double*)ptr;
*x = 5.0; // write
}
{
double* x = (double*)ptr;
r0 = r1 * *x; // read from it and do some arithmetic
}
I do not see it violating any strict-aliasing rules, but then how is it different from my initial question about r0 = r1 * (double*)(ptr);
?
If the corner-stone here is whether the memory was initially written as the same type as it it read later, then I can guarantee that it would be the case:
char* mem = (char*)malloc(lots of memory);
// Compiler
{
double* x = (double*)(mem + offset);
*x = 5.0; // write
}
// VM
{
double* x = (double*(mem + offset);
r0 = r1 * *x; // read from it and do some arithmetic
}
In the above example, the address mem + offset
was only ever accessed as double. No type punning is actually involved.
Then, the follow-up question, is it ok to do this:
char* mem = (char*)malloc(size);
// Compiler
{
double* x = (double*)(mem + offset);
*x = 5.0; // write
}
save_to_file(mem, size);
free(mem);
mem = malloc(size);
read_from_file(mem, size);
// VM
{
double* x = (double*(mem + offset);
r0 = r1 * *x; // read from it and do some arithmetic
}
How is this dealt with in real-life cases, e.g. writing VM, interpreters, emulators, etc.? I've been always doing just double* x = (double*)ptr
and it has always worked, but according to a huge amount of info floating around that is UB and heresy.
Just to reiterate, I'm not doing:
uint64_t a = 0;
double* x = (double*(&a);
all I'm doing is basically serializing and deserializing data to and from a raw memory block. I'm not reinterpreting data of one type as another type (except for char*, which seems still not to violate strict aliasing rules).
I've read these sources:
https://gist.github.com/shafik/848ae25ee209f698763cffee272a58f8
gcc, strict-aliasing, and horror stories
gcc, strict-aliasing, and casting through a union
and others, but I still can't find a definitive answer to my question.
I also want to know the nuances, if any, between C and C++ in this regard. I would always want to know has this is dealt with in the industry, not just from the perspective of letter-of-the-law of the standard. What about compilers? Are there language extensions for this?
Upvotes: -1
Views: 55