Noah
Noah

Reputation: 1759

Is there a performance cost to using large mmap calls that go beyond expected memory usage?

Edit: On systems that use on-demand paging

For initializing data structures that are both persistent for the duration of the program and require a dynamic amount of memory is there any reason not to mmap an upper bound from the start?

An example is an array that will persistent for the entire program's life but whose final size is unknown. The approach I am most familiar with is something along the lines of:

type * array = malloc(size);

and when the array has reached capacity doubling it with:

array = realloc(array, 2 * size);
size *= 2;

I understand this is probably the best way to do this if the array might freed mid execution so that its VM can be reused, but if it is persistent is there any reason not to just initialize the array as follows:

array = mmap(0,
             huge_size, 
             PROT_READ|PROT_WRITE, 
             MAP_ANONYMOUS|MAP_PRIVATE|MAP_NORESERVE, 
             -1, 0) 

so that the elements never needs to be copied.

Edit: Specifically for an OS that uses on-demand paging.

Upvotes: 2

Views: 597

Answers (3)

Willis Hershey
Willis Hershey

Reputation: 1574

If you know for a fact that wasting a chunk of memory (most likely an entire page which is likely 4096 bytes) will not cause your program or the other programs running on your system to run out of memory, AND you know for a fact that your program will only ever be compiled and run on UNIX machines, then this approach is not incorrect, but it is not good programming practice for the following reasons:

The <stdlib.h> file you #include to use malloc() and free() in your C programs is specified by the C standard, but it is specifically implemented for your architecture by the writers of the operating system. This means that your specific system was kept in-mind when these functions were written, so finding a sneaky way to improve efficiency for memory allocation is unlikely unless you know the inner workings of memory management in your OS better than those who wrote it.

Furthermore, the <sys/mman.h> file you include to mmap() stuff is not part of the C standard, and will only compile on UNIX machines, which reduces the portability of your code.

There's also a really good chance (assuming a UNIX environment) that malloc() and realloc() already use mmap() behind-the-scenes to allocate memory for your process anyway, so it's almost certainly better to just use them. (read that as "realloc doesn't necessarily actively allocate more space for me, because there's a good chance there's already a chunk of memory that my process has control of that can satisfy my new memory request without calling mmap() again")

Hope that helps!

Upvotes: 0

that other guy
that other guy

Reputation: 123660

It's fine to allocate upper bounds as long as:

  • You're building a 64bit program: 32bit ones have restricted virtual space, even on 64bit CPUs
  • Your upper bounds don't approach 2^47, as a mathematically derived one might
  • You're fine with crashing as your out-of-memory failure mode
  • You'll only run on systems where overcommit is enabled

As a side note, an end user application doing this may want to borrow a page from GHC's book and allocate 1TB up front even if 10GB would do. This unrealistically large amount will ensure that users don't confuse virtual memory usage with physical memory usage.

Upvotes: 1

Marco Bonelli
Marco Bonelli

Reputation: 69437

Don't try to be smarter than the standard library, unless you 100% know what you are doing.

malloc() already does this for you. If you request a large amount of memory, malloc() will mmap() you a dedicated memory area. If what you are concerned about is the performance hit coming from doing size *= 2; realloc(old, size), then just malloc(huge_size) at the beginning, and then keep track of the actual used size in your program. There really is no point in doing an mmap() unless you explicitly need it for some specific reason: it isn't faster nor better in any particular way, and if malloc() thinks it's needed, it will do it for you.

Upvotes: 2

Related Questions