David
David

Reputation: 397

Can I mmap a file with length greater than the size of the file?

void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);

I do not understand exactly how mmap works when using the MAP_PRIVATE flag. Can I pass a length greater than the size of file fd to mmap? After doing so, can I write and read the memory that exceeds the size of the file but is within length?

I am writing some code which calculates the MD5 of files. I have decided to write functions which only manipulate data as a void* and size_t len, instead of using the standard library stream functions. Before, I was using malloc and copying the files into some malloc'ed memory before using them, but that proved to be quite slow for large files, and quite stupid once I found out about mmap.

The problem that I am dealing with is that before calculating the MD5 of any data, some padding and information is appended to the data that will be hashed. With the previous solution of malloc I would just calculate how much data needs to be appended and then realloc and write. Now, I am calculating beforehand how much data needs to be appended and passing this increased length to mmap. On small files this works fine, but on large files trying to write to the addresses outside of the size of the file results in a segmentation fault.

This is vaguely what I'm trying to do:

#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#include <sys/mman.h>
#include <sys/stat.h>


// The Data + length struct
struct data{
        void* s;
        size_t len;
};

//mmap on opened file descriptor into a data struct
struct data* data_ffile(int fd)
{
        struct data* ret = malloc(sizeof(struct data));

        //Get the length of the file
        struct stat desc;
        fstat(fd, &desc);
        ret->len = (size_t)desc.st_size;

        //Calculate the length after appending
        size_t new_len =  ret->len + 1;
        if((new_len % 64) > 56)
                new_len += (64 * 2) - (new_len % 64);
        else if((new_len % 64) <= 56)
                new_len += 64 - (new_len % 64);

        //Map the file with the increased length
        ret->s = mmap(NULL, new_len, PROT_READ | PROT_WRITE,
                      MAP_PRIVATE, fd, 0);

        if(ret->s == MAP_FAILED) exit(-1);

        return ret;
}

//Append a character to the mmap'ed data
void data_addchar(struct data* w, unsigned char c)
{
        ((char*)w->s)[w->len++] = c;
        return;
}

void md5_append(struct data* md)
{
        data_addchar(md, 0x80);

        while((md->len % 64) != 56){
                data_addchar(md, (char)0);
        }
}

int main(int argc, char** argv)
{
        int fd = open(argv[1], O_RDONLY);
        struct data* in = data_ffile(fd);
        close(fd);

        md5_append(in);
}

Do I have basic misunderstanding of mmap?

Upvotes: 3

Views: 2244

Answers (1)

Employed Russian
Employed Russian

Reputation: 213829

Can I pass a length greater than the size of file fd to mmap? After doing so, can I write and read the memory that exceeds the size of the file but is within length?

This is all documented in the mmap POSIX specification:

The system shall always zero-fill any partial page at the end of an object. Further, the system shall never write out any modified portions of the last page of an object which are beyond its end. References within the address range starting at pa and continuing for len bytes to whole pages following the end of an object shall result in delivery of a SIGBUS signal.

  1. Yes, you can mmap length that is greater than the size of file, and
  2. Access to any pages beyond the end of the file, except the last (possibly partial) page will result in SIGBUS.

Upvotes: 4

Related Questions