Reputation: 3367

What is the optimal amount to malloc at a given time when the total needed is not known?

I've implemented a multi-level cache simulator that needs to store the values currently in the simulator. With current configurations, the maximum size of all values being stored could reach 2G. Obviously I'm not going to assume this worst case scenario and allocate all of that memory up-front. Instead, I have the program set to allocate memory as needed in chunks. The expense of this allocation is exacerbated by the fact that I'm callocing in order to provide 0 values when no write has occurred previously at the specified location.

My question is, is there a good heuristic for how much memory should be allocated each time more is needed? Currently I'm using an arbitrary value and I considered some solution that would use some ratio of the total system memory (I presume it's possible to dynamically detect this at compile and/or runtime), but even with the latter I'm using an arbitrary ratio with still doesn't sit well with me.

Any insight into best practices for this kind of situation would be appreciated!

Upvotes: 5

Answers (2)

justin

Reputation: 104698

It's best to understand allocation patterns of your program, if this is a problem you need to optimize for. This comes by understanding the program's implementation, the architecture(s) it runs within, and by observation (e.g. time and memory profiling).

The truth is, you can optimize from many perspectives, but things change over time (inputs change, environments change). In the user-land, your memory usage is already second guessed.

Given your allocation sizes, I assume you are already depending on a system which will default to a backing store as needed. As such, you don't have much control over what is paged or when. Peeking at available physical memory is not worth consideration in this case, and you will have to work hard to do better than the system's existing virtual memory implementation. Several of these systems try to use all available memory (e.g. "Unused RAM is wasted RAM").

Having said that and if those assumptions are correct: It's often better to just reduce your allocation sizes and working sets and do I/O yourself as needed.

Your OSs probably use disk caching as well; reads and writes are probably faster than you suspect for large blocks of memory.

Even deeper: Use virtual memory or memory mapped files for these large data sets. Your kernel will likely handle these cases very well.

Obviously I'm not going to assume this worst case scenario and allocate all of that memory up-front.

Then you will likely be surprised to learn that a 2 GB calloc alone may be better than other alternatives people come up with in some environments because a large calloc could just reserve a domain in virtual memory, loading/initializing pages only when you access them. Depending on your usage, this approach will be much better than some alternatives you may be given.

A good starting point for many problems when understanding a program or input's allocation patterns is to start out conservative, and then make the most beneficial adjustments based on observation. In many cases, you will need little more information than a) accurately determining how much to resize by when resizing is necessary b) reusing allocations where appropriate c) designing your data well for the problem at hand.

Upvotes: 0

phs

Reputation: 11051

A common rule of thumb is to grow geometrically, for example by doubling, on each reallocation.

Upvotes: 1

What is the optimal amount to malloc at a given time when the total needed is not known?

Answers (2)

Related Questions