pretzlstyle
pretzlstyle

Reputation: 2962

How to change characters in a text file using C's mmap()?

Let's say I have the standard "Hello, World! \n" saved to a text file called hello.txt. If I want to change the 'H' to a 'R' or something, can I achieve this with mmap()?

Upvotes: 5

Views: 5435

Answers (2)

Sam Arthur Gillam
Sam Arthur Gillam

Reputation: 438

Here's a working example.

    #include <stdio.h>
    #include <sys/types.h>
    #include <sys/stat.h>
    #include <fcntl.h>
    #include <sys/mman.h>

    int main(){
       int myFile = open("hello.txt", O_RDWR);
       if(myFile < 0){
           printf("open error\n");
       }
       struct stat myStat = {};
       if (fstat(myFile, &myStat)){
           printf("fstat error\n");
       }

       off_t size = myStat.st_size;
       char *addr;
       addr = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, myFile, 0);
       if (addr == MAP_FAILED){
           printf("mmap error\n");
       }
       if (addr[0] != 'H'){
           printf("Error: first char in file not H");
       }
       addr[0] = 'J';
       return 0;
   } 

Upvotes: 3

mmap does not exist in the standard C99 (or C11) specification. It is defined in POSIX.

So assuming you have a POSIX system (e.g. Linux), you could first open(2) the file for read & write:

int myfd = open("hello.txt", O_RDWR);
if (myfd<0) { perror("hello.txt open"); exit(EXIT_FAILURE); };

Then you get the size (and other meta-data) of the file with fstat(2):

struct stat mystat = {};
if (fstat(myfd,&mystat)) { perror("fstat"); exit(EXIT_FAILURE); };

Now the size of the file is in mystat.st_size.

off_t myfsz = mystat.st_size;

Now we can call mmap(2) and we need to share the mapping (to be able to write inside the file thru the virtual address space)

void*ad = mmap(NULL, myfsz, PROT_READ|PROT_WRITE, MAP_SHARED, 
               myfd, 0);
if (ad == MMAP_FAILED) { perror("mmap"); exit(EXIT_FAILURE); };

Then we can overwrite the first byte (and we check that indeed the first byte in that file is H since you promised so):

assert (*(char*ad) == 'H');
((char*)ad) = 'R';

We might call msync(2) to ensure the file is updated right now on the disk. If we don't, it could be updated later.

Notably for very large mappings (notably those much larger than available RAM), we can assist the kernel (and its page cache) with hints given thru madvise(2) or posix_madvise(3)...

Notice that a mapping remains in effect even after a close(2). Use munmap & mprotect or mmap with MAP_FIXED on the same address range to change them.

On Linux, you could use proc(5) to query the address space. So your program could read (e.g. after fopen, using fgets in a loop) the pseudo /proc/self/maps file (or /proc/1234/maps for process of pid 1234).

BTW, mmap is used by dlopen(3); it can be called a lot of times, my manydl.c program demonstrates that on Linux you could have many hundreds of thousands of dlopen-ed shared files (so many hundreds of thousands of memory mappings).

Upvotes: 5

Related Questions