Reputation: 548

std::vector increasing peak memory

This is in continuation of my last question. I am failed to understand the memory taken up by vector. Problem skeleton:

Consider an vector which is an collection of lists and lists is an collection of pointers. Exactly like:

std::vector<std::list<ABC*> > vec;

where ABC is my class. We work on 64bit machines, so size of pointer is 8 bytes.

At the start of my flow in the project, I resize this vector to an number so that I can store lists at respective indexes.

vec.resize(613284686);

At this point, capacity and size of the vector would be 613284686. Right. After resizing, I am inserting the lists at corresponding indexes as:

// Some where down in the program, make these lists. Simple push for now.
std::list<ABC*> l1;
l1.push_back(<pointer_to_class_ABC>);
l1.push_back(<pointer_to_class_ABC>);

// Copy the list at location
setInfo(613284686, l1);

void setInfo(uint64_t index, std::list<ABC*> list>) {
  std::copy(list.begin(), list.end(), std::back_inserter(vec.at(index));
}

Alright. So inserting is done. Notable things are:

Size of vector is : 613284686 Entries in the vector is : 3638243731 // Calculated this by going over vector indexes and add the size of std::lists at each index.

Now, since there are 3638243731 entries of pointers, I would expect memory taken by this vector is ~30Gb. 3638243731 * 8(bytes) = ~30Gb.

BUT BUT When I have this data in memory, memory peaks to, 400G.

And then I clear up this vector with:

std::vector<std::list<nl_net> >& ccInfo = getVec(); // getVec defined somewhere and return me original vec.
std::vector<std::list<nl_net> >::iterator it = ccInfo.begin();
for(; it != ccInfo.end(); ++it) {
  (*it).clear();
}

ccInfo.clear(); // Since it is an reference
std::vector<std::list<nl_net> >().swap(ccInfo); // This makes the capacity of the vector 0.

Well, after clearing up this vector, memory drops down to 100G. That is too much holding from an vector.

Would you all like to correct me what I am failing to understand here?

P.S. I can not reproduce it on smaller cases and it is coming in my project.

Upvotes: 4

Answers (3)

eerorika

Reputation: 238411

vec.resize(613284686);
At this point, capacity and size of the vector would be 613284686

It would be at least 613284686. It could be more.

std::vector<std::list<nl_net> >().swap(ccInfo); // This makes the capacity of the vector 0.

Technically, there is no guarantee by the standard that a default constructed vector wouldn't have capacity other than 0... But in practice, this is probably true.

Now, since there are 3638243731 entries of pointers, I would expect memory taken by this vector is ~30Gb. 3638243731 * 8(bytes)

But the vector doesn't contain pointers. It contains std::list<ABC*> objects. So, you should expect vec.capacity() * sizeof(std::list<ABC*>) bytes used by the buffer of the vector itself. Each list has at least a pointer to beginning and the end.

Furthermore, you should expect each element in each of the lists to use memory as well. Since the list is doubly linked, you should expect about two pointers plus the data (a third pointer) worth of memory for each element.

Also, each pointer in the lists apparently points to an ABC object, and each of those use sizeof(ABC) memory as well.

Furthermore, since each element of the linked lists are allocated separately, and each dynamic allocation requires book-keeping so that they can be individually de-allocated, and each allocation must be aligned to the maximum native alignment, and the free store may have fragmented during the execution, there will be much overhead associated with each dynamic allocation.

Well, after clearing up this vector, memory drops down to 100G.

It is quite typical for the language implementation to retain (some) memory it has allocated from the OS. If your target system documents an implementation specific function for explicitly requesting release of such memory, then you could attempt using that.

However, if the vector buffer wasn't the latest dynamic allocation, then its deallocation may have left a massive reusable area in the free store, but if there exists later allocations, then all that memory might not be releasable back to the OS.

Even if the langauge implementation has released the memory to the OS, it is quite typical for the OS to keep the memory mapped for the process until another process actually needs the memory for something else. So, depending on how you're measuring memory use, the results might not necessarily be meaningful.

General rules of thumb that may be useful:

Don't use a vector unless you use all (or most) of the indices. In case where you don't, consider a sparse array instead (there is no standard container for such data structure though).
When using vector, reserve before resize if you know the upper bound of allocation.
Don't use linked lists without a good reason.
Don't rely on getting all memory back from peak usage (back to the OS that is; The memory is still usable for further dynamic allocations).
Don't stress about virtual memory usage.

Upvotes: 6

Nathilion

Reputation: 329

The main vector needs some more consideration. I get the impression it will always be a fixed size. So why not use a std::array instead? A std::vector always allocates more memory than it needs to allow for growth. The bigger your vector the bigger the reservation of memory to allow for more even growth. The reasononing behind is to keep relocations in memory to a minimum. Relocations on really big vectors take up huge amounts of time so a lot of extra memory is reserved to prevent this.

No vector function that can delete elements (such as vector::clear and ::erase) also deallocates memory (e.g. lower the capacity). The size will decrease but the capacity doesn't. Again, this is meant to prevent relocations; if you delete you are also very likely to add again. ::shrink_to_fit also doesn't guarantuee you that all of the used memory is released.*

Next is the choice of a list to store elements. Is a list really applicable? Lists are strong in random access/insertion/removal operations. Are you really constantly adding and removing ABC objects to the list in random locations? Or is another container type with different properties but with contiguous memory more suitable? Another std::vector or std::array perhaps. If the answer is yes than you're pretty much stuck with a list and its scattered memory allocations. If no, than you could win back a lot of memory by using a different container type.

So, what is it you really want to do? Do you really need dynamic growth on both the main container and its elements? Do you really need random manipulation? Or can you use fixed-size arrays for both container and ABC objects and use iteration instead? When contemplating this you might want to read up on the available containers and their properties on en.cppreference.com. It will help you decide what is most appropriate.

*For the fun of it I dug around in VS2017's implementation and it creates an entirely new vector without the growth segment, copies the old elements and then reassigns the internal pointers of the old vector to the new one while deleting the old memory. So at least with that compiler you can count on memory being released.

Upvotes: 0

robthebloke

Reputation: 9678

std::list is a fragmented memory container. Typically each node MUST have the data it is storing, plus the 2 prev/next pointers, and then you have to add in the space required within the OS allocation table (typically 16 or 32 bytes per allocation - depending on OS). You then have to account for the fact all allocations must be returned on a 16byte boundary (on Intel/AMD based 64bit machines anyway).

So using the example of std::list<ABC*> the size of a pointer is 8, however you will need at least 48bytes to store each element (at least).

So memory usage for ONLY the list entries is going to be around: 3638243731 * 48(bytes) = ~162Gb. This is of course assuming that there is no memory fragmentation (where there may be a block of 62bytes free, and the OS returns the entire block of 62 rather than the 48 requested). We are also assuming here that the OS has a minimum allocation size of 48 bytes (and not say, 64bytes, which would not be overly silly, but would push the usage up far higher).

The size of the std::lists themselves within the vector comes to around 18GB. So in total we are looking at 180Gb at least to store that vector. It would not be beyond the realm of possibility that the extra allocations are additional OS book keeping info, for all of those individual memory allocations (e.g. lists of loaded memory pages, lists of swapped out memory pages, the read/write/mmap permissions, etc, etc).

As a final note, instead of using swap on a newly constructed vector, you can just use shrink to fit.

ccInfo.clear();
ccInfo.shrinkToFit();

Upvotes: 0

std::vector increasing peak memory

Answers (3)

Related Questions