Reputation: 973
Apologies if this has been asked before, I can't find a question that fully answers what I want to know. They mention ways to do this, but don't compare approaches.
I am writing a program in C++ to solve a PDE to steady state. I don't know how many time steps this will take. Therefore I don't know how long my time arrays will be. This will have a maximum time of 100,000s, but the time step could be as small as .001, so it could be as many as 1e8 double
s in length in the worst case (not necessarily a rare case either).
What is the most efficient way to implement this in terms of memory allocated and running time?
Options I've looked at:
std::vector
and it's size increasing functionalityAre there any other options?
I'm primarily concerned with speed, but I want to know what memory considerations come into it as well
Upvotes: 3
Views: 1310
Reputation:
As noted in the comments, if you are careful using vector
, you can actually reserve the capacity to store the maximum input size in advance (1e8 doubles) without paging in any memory.
For this you want to avoid the fill constructor and methods like resize
(which would end up accessing all the memory) and use reserve
and push_back
to fill it and only touch memory as needed. That will allow most operating systems to simply page in chunks of your accessed vector
at a time instead of the entire contents all at once.
Yet I tend to avoid this solution for the most part at these kinds of input scales, but for simple reasons:
Among the four, the first two could simply be paranoia. The third might just be plain dumb. Yet at least on operating systems like Windows, when using a debug build, the memory is initialized in its entirety early, and we end up mapping the allocated pages to DRAM immediately on reserving capacity for such a vector
. Then we might end up leading to a slight startup delay and a task manager showing 800 megabytes of memory usage for a debug build even before we've done anything.
While generally the efficiency of a debug build should be a minor concern, when the potential discrepancy between release and debug is enormous, it can start to render production code almost incapable of being effectively debugged. So when the differences are potentially vast like this, my preference is to "chunk it up".
The strategy I like to apply here is to allocate smaller chunks -- smaller arrays of N
elements, where N
might be, say, 512 doubles (just snug enough to fit a common denominator page size of 4 kilobytes -- possibly minus a couple of doubles for chunk metadata). We fill them up with elements, and when they get full, create another chunk.
With these chunks, we can aggregate them together by either linking them (forming an unrolled list) or storing a vector of pointers to them in a separate aggregate depending on whether random-access is needed or merely sequential access will suffice. For the random-access case, this incurs a slight overhead, yet one I've tended to find relatively small at these input scales which often have times dominated by the upper levels of the memory hierarchy rather than register and instruction level.
This might be overkill for your case and a careful use of vector
may be the best bet. Yet if that doesn't suffice and you have similar concerns/needs as I do, this kind of chunky solution might help.
Upvotes: 1
Reputation: 8431
The only way to know which option is 'most efficient' on your machine is to try a few different options and profile. I'd probably start with the following:
std::vector
constructed with the maximum possible size.std::vector
constructed with a conservative ballpark size and push_back
.std::deque
and push_back
.The std::vector
vs std::deque
debate is ongoing. In my experience, when the number of elements is unknown and not too large, std::deque
is almost never faster than std::vector
(even if the std::vector
needs multiple reallocations) but may end up using less memory. When the number of elements is unknown and very large, std::deque
memory consumption seems to explode and std::vector
is the clear winner.
If after profiling, none of these options offers satisfactory performance, then you may want to consider writing a custom allocator.
Upvotes: 0
Reputation: 11968
If you are concerned about speed just allocate 1e8 doubles and be done with it.
In most cases vector should work just fine. Remember that amortized it's O(1) for the append.
Unless you are running on something very weird the OS memory allocation should take care of most fragmentation issues and the fact that it's hard to find a 800MB free memory block.
Upvotes: 3