Carlos Vega
Carlos Vega

Reputation: 1371

Handle memory properly with a pool of structs

I have a program with three pools of structs. For each of them I use a list a of used structs and another one for the unused structs. During the execution the program consumes structs, and returns them back to the pool on demand. Also, there is a garbage collector to clean the "zombie" structs and return them to the pool.

At the beginning of the execution, the virtual memory, as expected, shows around 10GB* of memory allocated, and as the program uses the pool, the RSS memory increases.

Although the used nodes are back in the pool, marked as unused nodes, the RSS memory do not decreases. I expect this, because the OS doesn't know about what I'm doing with the memory, is not able to notice if I'm doing a real use of them or managing a pool.

What I would like to do is to force the unused memory to go back to virtual memory whenever I want, for example, when the RSS memory increases above X GB.

Is there any way to mark, given the memory pointer, a memory area to put it in virtual memory? I know this is the Operating System responsability but maybe there is a way to force it.

Maybe I shouldn't care about this, what do you think?

Thanks in advance.

I provide a picture of the pool usage vs the memory usage, for a few files. As you can see, the sudden drops in the pool usage are due to the garbage collector, what I would like to see, is this drop reflected in the memory usage.

Struct Pools Usage & Memory Usage

Upvotes: 5

Views: 530

Answers (2)

VonC
VonC

Reputation: 1323743

Git 2.19 (Q3 2018) offers an example of memory pool of struct, using mmap, not malloc.

For a large tree, the index needs to hold many cache entries allocated on heap.
These cache entries are now allocated out of a dedicated memory pool to amortize malloc(3) overhead.

See commit 8616a2d, commit 8e72d67, commit 0e58301, commit 158dfef, commit 8fb8e3f, commit a849735, commit 825ed4d, commit 768d796 (02 Jul 2018) by Jameson Miller (jamill). (Merged by Junio C Hamano -- gitster -- in commit ae533c4, 02 Aug 2018)

block alloc: allocate cache entries from mem_pool

When reading large indexes from disk, a portion of the time is dominated in malloc() calls.
This can be mitigated by allocating a large block of memory and manage it ourselves via memory pools
.

This change moves the cache entry allocation to be on top of memory pools.

Design:

The index_state struct will gain a notion of an associated memory_pool from which cache_entries will be allocated from.
When reading in the index from disk, we have information on the number of entries and their size, which can guide us in deciding how large our initial memory allocation should be.
When an index is discarded, the associated memory_pool will be discarded as well - so the lifetime of a cache_entry is tied to the lifetime of the index_state that it was allocated for.

In the case of a Split Index, the following rules are followed.
1st, some terminology is defined:

Terminology:

  • 'the_index': represents the logical view of the index
  • 'split_index': represents the "base" cache entries. Read from the split index file.

'the_index' can reference a single split_index, as well as cache_entries from the split_index. the_index will be discarded before the split_index is.
This means that when we are allocating cache_entries in the presence of a split index, we need to allocate the entries from the split_index's memory pool.

This allows us to follow the pattern that the_index can reference cache_entries from the split_index, and that the cache_entries will not be freed while they are still being referenced.

Managing transient cache_entry structs:

Cache entries are usually allocated for an index, but this is not always the case. Cache entries are sometimes allocated because this is the type that the existing checkout_entry function works with.
Because of this, the existing code needs to handle cache entries associated with an index / memory pool, and those that only exist transiently.
Several strategies were contemplated around how to handle this.

Chosen approach:

An extra field was added to the cache_entry type to track whether the cache_entry was allocated from a memory pool or not.
This is currently an int field, as there are no more available bits in the existing ce_flags bit field.
If / when more bits are needed, this new field can be turned into a proper bit field.

We decided tracking and iterating over known memory pool regions was less desirable than adding an extra field to track this state.

Upvotes: 1

Vality
Vality

Reputation: 6607

You can do this as long as you are allocating your memory via mmap and not via malloc. You want to use the madvise function with the POSIX_MADV_DONTNEED argument.

Just remember to run madvise with POSIX_MADV_WILLNEED before using them again to ensure there is actually memory behind them.

This does not actually guarantee the pages will be swapped out but gives the kernel a strong hint to do so when it has time.

Upvotes: 3

Related Questions