Reputation: 7824
I have written a computer simulation in C++ that needs a lot of memory. It runs in iterations, and in each iteration allocates a large amount of memory that should be freed at the end of the iteration. It also uses c++11's implementation of <thread>
to run stuff in parallel.
When I test the program on my desktop machine, it behaves fine: It never exceeds the memory I allow it and during time and iterations, nothing stacks up. When I submit the program to our computation cluster, however, the used memory (to which I have access only through the queuing software) grows with time and by far exceeds the memory used on my machine.
Let me first show you very roughly how the software is structured:
for thread in n_threads:
vector<Object> container;
for iteration in simulation:
container.clear()
container.resize(a_large_number)
... do stuff ...
Let's say, on my machine the container eats up 2GB
of memory. I can see both in htop
and in valgrind --tool=massif
that these 2GB
are never exceeded. Nothing piles up. On the cluster, however, I can see the memory grow and grow, until it becomes much more than the 2GB
(and the jobs are killed/the computation node freezes...). Note, that I limit the numbers of threads on both machines and can be sure that they are equal.
What I do know, is that the libc
on the cluster is very old. To compile my program, I needed to compile a new version of g++
and update the libc
on the front node of the cluster. The software does run fine on the computation nodes (except for this memory issue), but the libc is much older there. Could this be an issue, especially together with threading, for memory allocation? How could I investigate that?
Upvotes: 0
Views: 206
Reputation: 4517
Yes, depending on how old the GNU libc is, you might be missing some important memory allocation optimizations. Here are some things to try out (needless to say, risking performance penalties):
You can try tweaking the malloc/free behaviour through mallopt()
; use the M_MMAP_MAX
and M_MMAP_THRESHOLD
options to encourage more allocations to go through mmap()
, this way the memory is guaranteed to be returned to the system after free()
.
Try making your container's allocator be __gnu_cxx::malloc_allocator
, to ensure the mallopt()
tweaks affect the container.
Try calling container.shrink_to_fit()
after the resize, to make sure the vector is not withholding more memory than strictly needed.
Upvotes: 1