Reputation: 6587
Consider the following code:
fn main() {
let mut vec: Vec<u8> = Vec::new();
vec.push(0);
vec.push(1);
vec.push(2);
vec.push(3);
vec.push(4);
vec.push(5);
vec.push(6);
vec.push(7);
vec.push(8);
}
When Vec::new()
is called, Rust doesn't know how much to allocate, and each time it needs to allocate more space for a vector, it calls malloc
with the new size and then clones all the data from the old location in the heap to the new, right?
What is Rust's strategy for knowing the new size to allocate?
For example, does Rust allocate each time something is pushed onto the vector, like this?
[]
[0]
[0, 1]
[0, 1, 2]
[0, 1, 2, 3]
etc...
That seems inefficient. Maybe Rust does something like this:
[]
[0, <empty>]
[0, 1]
[0, 1, 2, <empty>, <empty>, <empty>, <empty>, <empty>]
[0, 1, 2, 3, <empty>, <empty>, <empty>, <empty>]
[0, 1, 2, 3, 4, <empty>, <empty>, <empty>]
etc...
Upvotes: 5
Views: 5150
Reputation: 5387
Vec
has a length and a capacity. The capacity is the number of elements the Vec
can store without reallocating. When the capacity is reached and a new element is pushed, the capacity is increased and the Vec
is reallocated.
The strategy is not specified and might change in the future. The current strategy is to double the capacity when it must be increased. However, there is a special case: Vec::new()
doesn't allocate, but pushing an element for the first time allocates enough space for up to 8 elements:
bool
or u8
), the capacity is set to 8, which is 64 bitsNote that this is an implementation detail and can change. In fact, it changed recently.
Upvotes: 7
Reputation: 30577
The same problem is faced by any language that supports dynamic growable containers. They almost invariably allocate chunks at a time to minimize allocations and relocations.
In fact the documentation for Vec says as much. The reserve method for example:
Reserves capacity for at least additional more elements to be inserted in the given Vec. The collection may reserve more space to avoid frequent reallocations.
The actual algorithm used is not documented in the std documentation (that I could find) but you can see the implementation in the std library source code, in module raw_vec, function grow_amortized. The size is doubled each time the vector needs to be grown, in most cases.
If you know in advance how big the Vec needs to be, you can avoid the overhead of resizing by calling Vec::with_capacity.
By the way, Vec illustrates how important Rust's safety rules are. If you were to take a reference to an element in the Vec, and while holding that reference you triggered a re-allocation, the reference would be pointing to the old location, and thus invalid. Thanks to the borrow checker, and the fact that if you have one mutable borrow of the Vec you cannot have any other borrows, the compiler would not let you make that mistake.
Upvotes: 3
Reputation: 1447
The documentation on Vec says this:
Vec does not guarantee any particular growth strategy when reallocating when full, nor when reserve is called. The current strategy is basic and it may prove desirable to use a non-constant growth factor. Whatever strategy is used will of course guarantee O(1) amortized push.
The guarantee of amortized O(1) push implies that Rust does not reallocate on every push. It must look more like the second scenario that you described, where it allocates extra capacity to make room for additional elements to be pushed.
If we dig a little into the source code for the Rust standard library, we will see that in this situation (pushing one element at a time), the current implementation of Vec in fact doubles the capacity of the Vec on each reallocation. This exact strategy, of course, is not specified and could change in future implementations.
If you know from the start how big your Vec will need to be, you can construct it with Vec::with_capacity to avoid the need for any reallocations.
Upvotes: 12