User space memory fragmentation

Question

Let's suppose a very basic c++ program which allocates a huge number of small std::vector's. I don't really know how the compiler and the OS will place those vectors in process memory space but if the number is big, I think some vectors could be close by (near).

Now, let's suppose I delete some vectors in memory and I keep a few others. Imagine I want to add 10000 items to the first vector.

What will happen if the second vector is too close in memory? Do you think i will get a "low memory" error, or should the OS move the first vector?

Philipp Cla&#223;en · Accepted Answer

No, it does not matter if vectors are close to each other. Only if a vector reaches a size where no contiguous block of memory is left to hold its memory, you will get an error (for the default allocator, an std::bad_alloc exception will be thrown).

What happens internally is similar to what you probably mean with moving, but in C++11 that term has a different meaning, so I will try to avoid that and would rather call it reallocated. Also note that the operating system has nothing to do with it.

Let us look at the details:

It is correct that an std::vector is contiguous, but (in contrast to an std::array) its elements are not directly stored inside the std::vector instance itself. Instead, it stores the underlying array on the heap and only keeps a pointer to it.

For efficiency reasons, implementations are allowed to make its internal array bigger than the number of elements stored in the array. For example:

std::vector v;
assert(v.size() == 0);
assert(v.capacity() >= 0); // at least v.size(), but can be higher

When you add new elements to the vector (e.g., via v.push_back), one the following two things will happen:

If there is enough space left (i.e., v.size() < v.capacity()), the new element can be added without any extra memory allocation
Otherwise, the underlying array has to be increased, which involves the following steps:
1. A new a new (larger) array will be allocated.
2. All elements from the old array has to be copied to the new array.
3. The new array replaces the old array (which will be deallocated) and you can insert the new element.

It is important to note that the std::vector instance itself will stay at the same memory address, only its internal pointer will now point to the newly created (larger) array. In that respect, the data has been moved to a different memory location. (That also has consequences, for instance, all iterations that you kept to the elements are now invalidated.)

The critical operation is the reallocation of the memory. Here, memory fragmentation comes into play. It can happen that because of fragmentation, there is not possible to allocate the new array even if there would be enough spaces without fragmentation.

In contrast to other languages, it is not possible in C++ to avoid the fragmentation in the way a compacting garbage collector will do (e.g., some GC implementations in Java are compacting). In the same way, the operating system cannot help to avoid memory fragmentation in C++. A least in theory. In practice, in today's 64-bit systems (with virtual memory), memory fragmentation is less of a concern that it used to be.

If you do not need the property that the elements in your container have to be contiguous, you can use std::dequeue instead of std::vector. It is more robust against memory fragmentation because it will not keep one big array but several smaller blocks. On the other hand, std::vector is typically more efficient, so I would by default still use the vector, but here is an old article from Herb Sutter that touches the topic: Using Vector and Deque

User space memory fragmentation

Answers (2)

Related Questions