D. Pardal
D. Pardal

Reputation: 6587

How does Rust allocate more space for a vector?

Consider the following code:

fn main() {
  let mut vec: Vec<u8> = Vec::new();
  vec.push(0);
  vec.push(1);
  vec.push(2);
  vec.push(3);
  vec.push(4);
  vec.push(5);
  vec.push(6);
  vec.push(7);
  vec.push(8);
}

When Vec::new() is called, Rust doesn't know how much to allocate, and each time it needs to allocate more space for a vector, it calls malloc with the new size and then clones all the data from the old location in the heap to the new, right?

What is Rust's strategy for knowing the new size to allocate?

For example, does Rust allocate each time something is pushed onto the vector, like this?

[]
[0]
[0, 1]
[0, 1, 2]
[0, 1, 2, 3]
etc...

That seems inefficient. Maybe Rust does something like this:

[]
[0, <empty>]
[0, 1] 
[0, 1, 2, <empty>, <empty>, <empty>, <empty>, <empty>]
[0, 1, 2, 3, <empty>, <empty>, <empty>, <empty>]
[0, 1, 2, 3, 4, <empty>, <empty>, <empty>]
etc...

Upvotes: 5

Views: 5150

Answers (3)

Aloso
Aloso

Reputation: 5387

Vec has a length and a capacity. The capacity is the number of elements the Vec can store without reallocating. When the capacity is reached and a new element is pushed, the capacity is increased and the Vec is reallocated.

The strategy is not specified and might change in the future. The current strategy is to double the capacity when it must be increased. However, there is a special case: Vec::new() doesn't allocate, but pushing an element for the first time allocates enough space for up to 8 elements:

  • If each element is only 1 byte (e.g. bool or u8), the capacity is set to 8, which is 64 bits
  • If the elements are between 2 byte and 1 KiB big, the capacity is set to 4
  • Otherwise, the capacity is set to 1

Note that this is an implementation detail and can change. In fact, it changed recently.

Upvotes: 7

harmic
harmic

Reputation: 30577

The same problem is faced by any language that supports dynamic growable containers. They almost invariably allocate chunks at a time to minimize allocations and relocations.

In fact the documentation for Vec says as much. The reserve method for example:

Reserves capacity for at least additional more elements to be inserted in the given Vec. The collection may reserve more space to avoid frequent reallocations.

The actual algorithm used is not documented in the std documentation (that I could find) but you can see the implementation in the std library source code, in module raw_vec, function grow_amortized. The size is doubled each time the vector needs to be grown, in most cases.

If you know in advance how big the Vec needs to be, you can avoid the overhead of resizing by calling Vec::with_capacity.

By the way, Vec illustrates how important Rust's safety rules are. If you were to take a reference to an element in the Vec, and while holding that reference you triggered a re-allocation, the reference would be pointing to the old location, and thus invalid. Thanks to the borrow checker, and the fact that if you have one mutable borrow of the Vec you cannot have any other borrows, the compiler would not let you make that mistake.

Upvotes: 3

Brent Kerby
Brent Kerby

Reputation: 1447

The documentation on Vec says this:

Vec does not guarantee any particular growth strategy when reallocating when full, nor when reserve is called. The current strategy is basic and it may prove desirable to use a non-constant growth factor. Whatever strategy is used will of course guarantee O(1) amortized push.

The guarantee of amortized O(1) push implies that Rust does not reallocate on every push. It must look more like the second scenario that you described, where it allocates extra capacity to make room for additional elements to be pushed.

If we dig a little into the source code for the Rust standard library, we will see that in this situation (pushing one element at a time), the current implementation of Vec in fact doubles the capacity of the Vec on each reallocation. This exact strategy, of course, is not specified and could change in future implementations.

If you know from the start how big your Vec will need to be, you can construct it with Vec::with_capacity to avoid the need for any reallocations.

Upvotes: 12

Related Questions