Ogromny
Ogromny

Reputation: 33

Why does calling Vec::resize before calling Vec::set_len cause the Vec to have data?

I have a problem that I do not understand:

fn cipher_with(key: &[u8], data: &[u8]) -> Vec<u8> {
    let data_len = 16;

    let mut data = data.to_vec();
    data.resize(data_len, 2);

    let mut output = Vec::<u8>::with_capacity(data_len);
    unsafe { output.set_len(data_len) }

    output
}

fn main() {
    let key = "blabla".as_bytes();
    let data = "lorem ipsum.".as_bytes();
    println!("{:?}", cipher_with(&key, &data));
}

This prints:

[108, 111, 114, 101, 109, 32, 105, 112, 115, 117, 109, 46, 0, 0, 0, 0]

But how is it done? I never gave this value to output.

Upvotes: 3

Views: 637

Answers (2)

Peter Hall
Peter Hall

Reputation: 58805

You are using unsafe Rust, which can give you unpredictable results.

In this particular case, you are extending the size of the Vec into uninitialized memory. The values are whatever happens to be there already.

So let's look at some of the code:

let mut data = data.to_vec();

This copies the data "lorem ipsum." onto the heap in the form of a vector.

data.resize(data_len, 2); // data_len = 16

This increases the capacity of the Vec from 12 to 16 items, which happen to be bytes in this case. But actually, based on what we are seeing, it looks like the implementation (or possibly the optimiser) decided it was better to just abandon the first allocated memory range and copy the data to new memory instead.

let mut output = Vec::<u8>::with_capacity(data_len);
unsafe { output.set_len(data_len) }

This creates a new vector and unsafely gives it a length. But you didn't initialise it, so the data will be what was there previously.

It looks like data.resize() actually copied the value instead of just dropping the end of the vector. When output was allocated, it was allocated the same block of memory that was previously used, which is why it contains "lorem ipsum.".

Upvotes: 3

Shepmaster
Shepmaster

Reputation: 431619

To add some details to Peter's answer, check out this annotated version:

fn cipher_with(key: &[u8], data: &[u8]) -> Vec<u8> {
    let data_len = 16;

    let mut data = data.to_vec();
    println!("{:?}", data.as_ptr());
    data.resize(data_len, 2);
    println!("{:?}", data.as_ptr());

    let mut output = Vec::<u8>::with_capacity(data_len);
    println!("{:?}", output.as_ptr());
    unsafe { output.set_len(data_len) }

    output
}
0x7fa6dba27000
0x7fa6dba1e0c0
0x7fa6dba27000

When the first vector is created, it has a length of 12. When it's resized to 16, a new allocation is made and the data copied. This is likely due to the implementation of the allocator, which usually chunks allocations into buckets. 16 would be a reasonable bucket size.

When the second vector is created, the allocator hands back the same pointer that the first vector just gave up. Since nothing else has changed this memory in the mean time, it still contains whatever data was in data.

Upvotes: 5

Related Questions