Reputation: 9073
Let's suppose a very basic c++ program which allocates a huge number of small std::vector's. I don't really know how the compiler and the OS will place those vectors in process memory space but if the number is big, I think some vectors could be close by (near).
Now, let's suppose I delete some vectors in memory and I keep a few others. Imagine I want to add 10000 items to the first vector.
What will happen if the second vector is too close in memory? Do you think i will get a "low memory" error, or should the OS move the first vector?
Upvotes: 2
Views: 195
Reputation: 476
When your std::vector
runs out of capacity, it'll reallocate space (usually 2 * required_size
, see amortized complexity) and move elements already in the vector. It will move the data pointer inside the first vector, it won't move the vector itself (your vector and your vector data is in different locations).
Your std::vector
and the elements "inside" it are normally not in the same spot. This incomplete pseudo-implementation is wrong for a number of reasons but might illustrate how push_back
scales internally:
namespace std {
template<typename T>
class vector<T>
size_t size_;
size_t capacity_;
T* data_; // Stored elsewhere on the heap.
void push_back(const T& foo) {
if (size_ == capacity_) {
capacity_ *= 2; // assuming capacity_ > 0, and non-wrapping size
data_ = realloc(data_, capacity_ * sizeof(T)); // assumes POD types and no realloc failures.
}
data_[++size_] = foo;
}
}
}
realloc
here will move the data inside the vector, so any old references to &vector[0]
are garbage after push_back
reallocates the vector. realloc
takes care of finding a continuous segment that's large enough to store N new elements (might have to mmap
more memory).
Another example that explains the separation:
int main() {
std::vector<float> numbers; // the vector is on the stack and never moves.
numbers.push_back(5.0f);
// 5.0f is stored inside vector data, which may be on the heap.
// Adding more items may allocate heap memory and move all previous items.
return 0;
}
Upvotes: 2
Reputation: 43970
No, it does not matter if vectors are close to each other. Only if a vector reaches a size where no contiguous block of memory is left to hold its memory, you will get an error (for the default allocator, an std::bad_alloc
exception will be thrown).
What happens internally is similar to what you probably mean with moving, but in C++11 that term has a different meaning, so I will try to avoid that and would rather call it reallocated. Also note that the operating system has nothing to do with it.
Let us look at the details:
It is correct that an std::vector
is contiguous, but (in contrast to an std::array
) its elements are not directly stored inside the std::vector
instance itself. Instead, it stores the underlying array on the heap and only keeps a pointer to it.
For efficiency reasons, implementations are allowed to make its internal array bigger than the number of elements stored in the array. For example:
std::vector<int> v;
assert(v.size() == 0);
assert(v.capacity() >= 0); // at least v.size(), but can be higher
When you add new elements to the vector (e.g., via v.push_back
), one the following two things will happen:
v.size() < v.capacity()
), the new element can be added without any extra memory allocationOtherwise, the underlying array has to be increased, which involves the following steps:
It is important to note that the std::vector
instance itself will stay at the same memory address, only its internal pointer will now point to the newly created (larger) array. In that respect, the data has been moved to a different memory location. (That also has consequences, for instance, all iterations that you kept to the elements are now invalidated.)
The critical operation is the reallocation of the memory. Here, memory fragmentation comes into play. It can happen that because of fragmentation, there is not possible to allocate the new array even if there would be enough spaces without fragmentation.
In contrast to other languages, it is not possible in C++ to avoid the fragmentation in the way a compacting garbage collector will do (e.g., some GC implementations in Java are compacting). In the same way, the operating system cannot help to avoid memory fragmentation in C++. A least in theory. In practice, in today's 64-bit systems (with virtual memory), memory fragmentation is less of a concern that it used to be.
If you do not need the property that the elements in your container have to be contiguous, you can use std::dequeue
instead of std::vector
. It is more robust against memory fragmentation because it will not keep one big array but several smaller blocks. On the other hand, std::vector
is typically more efficient, so I would by default still use the vector, but here is an old article from Herb Sutter that touches the topic: Using Vector and Deque
Upvotes: 3