Bob5421
Bob5421

Reputation: 9073

User space memory fragmentation

Let's suppose a very basic c++ program which allocates a huge number of small std::vector's. I don't really know how the compiler and the OS will place those vectors in process memory space but if the number is big, I think some vectors could be close by (near).

Now, let's suppose I delete some vectors in memory and I keep a few others. Imagine I want to add 10000 items to the first vector.

What will happen if the second vector is too close in memory? Do you think i will get a "low memory" error, or should the OS move the first vector?

Upvotes: 2

Views: 195

Answers (2)

pbos
pbos

Reputation: 476

When your std::vector runs out of capacity, it'll reallocate space (usually 2 * required_size, see amortized complexity) and move elements already in the vector. It will move the data pointer inside the first vector, it won't move the vector itself (your vector and your vector data is in different locations).

Your std::vector and the elements "inside" it are normally not in the same spot. This incomplete pseudo-implementation is wrong for a number of reasons but might illustrate how push_back scales internally:

namespace std {

template<typename T>
class vector<T>
  size_t size_;
  size_t capacity_;
  T* data_;  // Stored elsewhere on the heap.
  void push_back(const T& foo) {
    if (size_ == capacity_) {
      capacity_ *= 2;  // assuming capacity_ > 0, and non-wrapping size
      data_ = realloc(data_, capacity_ * sizeof(T));  // assumes POD types and no realloc failures.
    }
    data_[++size_] = foo;
  }
}
}

realloc here will move the data inside the vector, so any old references to &vector[0] are garbage after push_back reallocates the vector. realloc takes care of finding a continuous segment that's large enough to store N new elements (might have to mmap more memory).

Another example that explains the separation:

int main() {
  std::vector<float> numbers;  // the vector is on the stack and never moves.

  numbers.push_back(5.0f);
  // 5.0f is stored inside vector data, which may be on the heap. 
  // Adding more items may allocate heap memory and move all previous items.

  return 0;
}

Upvotes: 2

Philipp Cla&#223;en
Philipp Cla&#223;en

Reputation: 43970

No, it does not matter if vectors are close to each other. Only if a vector reaches a size where no contiguous block of memory is left to hold its memory, you will get an error (for the default allocator, an std::bad_alloc exception will be thrown).

What happens internally is similar to what you probably mean with moving, but in C++11 that term has a different meaning, so I will try to avoid that and would rather call it reallocated. Also note that the operating system has nothing to do with it.

Let us look at the details:

It is correct that an std::vector is contiguous, but (in contrast to an std::array) its elements are not directly stored inside the std::vector instance itself. Instead, it stores the underlying array on the heap and only keeps a pointer to it.

For efficiency reasons, implementations are allowed to make its internal array bigger than the number of elements stored in the array. For example:

std::vector<int> v;
assert(v.size() == 0);
assert(v.capacity() >= 0); // at least v.size(), but can be higher

When you add new elements to the vector (e.g., via v.push_back), one the following two things will happen:

  • If there is enough space left (i.e., v.size() < v.capacity()), the new element can be added without any extra memory allocation
  • Otherwise, the underlying array has to be increased, which involves the following steps:

    1. A new a new (larger) array will be allocated.
    2. All elements from the old array has to be copied to the new array.
    3. The new array replaces the old array (which will be deallocated) and you can insert the new element.

It is important to note that the std::vector instance itself will stay at the same memory address, only its internal pointer will now point to the newly created (larger) array. In that respect, the data has been moved to a different memory location. (That also has consequences, for instance, all iterations that you kept to the elements are now invalidated.)

The critical operation is the reallocation of the memory. Here, memory fragmentation comes into play. It can happen that because of fragmentation, there is not possible to allocate the new array even if there would be enough spaces without fragmentation.

In contrast to other languages, it is not possible in C++ to avoid the fragmentation in the way a compacting garbage collector will do (e.g., some GC implementations in Java are compacting). In the same way, the operating system cannot help to avoid memory fragmentation in C++. A least in theory. In practice, in today's 64-bit systems (with virtual memory), memory fragmentation is less of a concern that it used to be.

If you do not need the property that the elements in your container have to be contiguous, you can use std::dequeue instead of std::vector. It is more robust against memory fragmentation because it will not keep one big array but several smaller blocks. On the other hand, std::vector is typically more efficient, so I would by default still use the vector, but here is an old article from Herb Sutter that touches the topic: Using Vector and Deque

Upvotes: 3

Related Questions