Karun
Karun

Reputation: 577

accessing mmap region using structure pointer

Its possible that if I access memory map of a file, via pointer of a structure type which has hole, it may not map the structure elements to correct data. For eg.

#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>

typedef union{
    int a;
    char c[4];
}INT;

typedef struct{
    char type;
    INT data;
}RECORD;

int main(){
    int fd;
    RECORD *recPtr;
    fd = open("./f1", O_RDWR);
    if (fd == -1){
            printf("Open Failed!\n");
    }
    printf("Size of RECORD: %d\n", sizeof(RECORD));
    recPtr = (RECORD *)mmap(0, 2*sizeof(RECORD), PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    if (recPtr == MAP_FAILED){
            printf("Map Filaed!\n");
    }
    printf("type: %c, data: %c%c%c%c\n", recPtr->type, recPtr->data.c[0], recPtr->data.c[1], recPtr->data.c[2], recPtr->data.c[3]);
}

If the file "f1" contains the following data:

012345678

The above programs gives the output as

Size of RECORD: 8
type: 0, data: 4567

since the characters 123 are eaten up by the structure holes.

Is there a way to avoid this without using pragma pack directive and without changing the ordering of elements in the structure.

Upvotes: 0

Views: 2154

Answers (2)

datenwolf
datenwolf

Reputation: 162299

Reading binary data directly into structures is a recipe for disaster. It means you're making assumptions about the structure of some input without verification; of course you could check the structure for integrity afterwards. But more often than not you'll have to do architecture dependent adjustments to the input data. Think low endian vs. big endian. Different word lengths, packing rules, etc.

To make a long story short: Don't fall for the dark side and it's seducing promise of quick hacks.

The only proper way to read a file is reading it octet by octet; you can read larger chunks in a buffer of course, but you should then process them by looking at each single bit. If you worry about performance you should read Volume 1 and what's been released to far of Volume 4 of "The Art of Computer Programming" which in depth explains how to process data streams efficiently without neglecting any data.

Or use Google's Protocol Buffers.

Upvotes: 1

bdonlan
bdonlan

Reputation: 231293

You basically have the following options:

  1. Accept the padding. This is fine (and the fastest option) as long as your data does not need to be portable across architectures.
  2. Use __attribute__((packed)) or similar to control padding inserted by the compiler (recommended, but requires that you use compiler extensions)
  3. Manually access at the byte level, without using structs. Eg:

    char type;
    int data;
    
    memcpy(&type, ((char *)recPtr), 1);
    memcpy(&data, ((char *)recPtr) + 1, sizeof(data));
    

Upvotes: 2

Related Questions