Neet33
Neet33

Reputation: 281

Reading struct from mmap

typedef struct aaa {
  int a;
  int b;
  long ptr_to_st2; //offset from the beginning of the file.
} st1;

typedef struct bbb {
  int get;
  char it;
} st2;

I have a binary file mapped to memory using mmap. The file contains st1 at the beginning of the file and then some data and then st2.

unsigned char *filemap; //mmap
st1 *first=(st1 *)filemap;
st2 *second=(st2 *)filemap+first->ptr_to_st2;
printf("%c",second->it);

I've been told this code is incorrect and violates strict aliasing rule. What is the correct way to write this code? Thanks.

Upvotes: 0

Views: 451

Answers (1)

autistic
autistic

Reputation: 15642

To put it simply, int has an alignment requirement. Supposing sizeof (int) is two on your machine, and we look at your memory as a sequence of blocks:

[a][a][b][b][c][c][d][d]...

We can store an int in the [a] blocks, the [b] blocks and so on... Basically at every second address... but not between them.

On our common household machines, we may in fact be able to store them in between, but this comes at a performance cost; the bus is still aligned to retrieve integers that satisfy the alignment requirement, so there'll be two retrievals via the bus for every one misaligned integer. That is undesired.

On uncommon household machines (such as old Apples, or even those things we don't commonly program for, such as just about every router on the planet) such a misaligned access will cause a condition similar to a segfault, known as a bus error. That is definitely undesired!


If you serialise and deserialise your information properly (as opposed to just using typecasts to reinterpret parts of the array), you won't see any of these problems. That is, translate your structures byte by byte, for example:

void serialise_st1(void *destination, st1 *source) {
    unsigned char *d = destination;
    unsigned long  s = (unsigned int) source->a;

    d[0] = s >> 8;
    d[1] = s;

    s = (unsigned int) source->b;
    d[2] = s >> 8;
    d[3] = s;

    s = source->ptr_to_st2;
    d[4] = s >> 24;
    d[5] = s >> 16;
    d[6] = s >> 8;
    d[7] = s;
}

Notice how I translated into every byte, manually? The deserialisation process is a little tougher due to the need to handle the sign, but it is essentially the reverse: Rather than assigning to each byte individually, we access each byte individually.

void deserialise_st1(st1 *destination, void *source) {
    unsigned char *s = source;
    *destination = (st1) { .a = (s[0] <= 127 ? s[0] : -(256 - s[0])) * 0x0100
                              +  s[1],
                           .b = (s[2] <= 127 ? s[2] : -(256 - s[2])) * 0x0100
                              +  s[3],
                           .ptr_to_st2 = (s[4] <= 127 ? s[4] : -(256 - s[4])) * 0x01000000
                                       +  s[5] * 0x00010000
                                       +  s[6] * 0x00000100
                                       +  s[7] };
}

Then, adapting upon your example:

unsigned char *filemap;
st1 first;
deserialise_st1(&first, filemap);

I'll leave it as an exercise for you to write deserialise_st2, but feel free to ask if you have any problems doing so.

st2 second;
deserialise_st2(&second, filemap + st1.ptr_to_st2);

Assuming your code goes on to update first or second, and you want to push those updates into your filemap, you would need to know the offset that it came from... That is, you'll want to assosciate filemap as the pointer to first (first_ptr), and filemap + st1.ptr_to_st2 as the pointer to second (second_ptr)... Then:

serialise_st1(first_ptr, &st1);
serialise_st2(second_ptr, &st2);

Upvotes: 1

Related Questions