Timmmm
Timmmm

Reputation: 96832

How is TinyVec the same size as Vec?

TinyVec is defined as:

pub enum TinyVec<A: Array> {
  #[allow(missing_docs)]
  Inline(ArrayVec<A>),
  #[allow(missing_docs)]
  Heap(Vec<A::Item>),
}

However if you run this code you get interesting results (playground):

use tinyvec::*;

fn main() {
    dbg!(std::mem::size_of::<Vec<u8>>());
    
    dbg!(std::mem::size_of::<TinyVec<[u8; 13]>>());
    dbg!(std::mem::size_of::<TinyVec<[u8; 14]>>());
    dbg!(std::mem::size_of::<TinyVec<[u8; 15]>>());
    dbg!(std::mem::size_of::<TinyVec<[u8; 16]>>());
}

Output:

[src/main.rs:4] std::mem::size_of::<Vec<u8>>() = 24
[src/main.rs:6] std::mem::size_of::<TinyVec<[u8; 13]>>() = 24
[src/main.rs:7] std::mem::size_of::<TinyVec<[u8; 14]>>() = 24
[src/main.rs:8] std::mem::size_of::<TinyVec<[u8; 15]>>() = 32
[src/main.rs:9] std::mem::size_of::<TinyVec<[u8; 16]>>() = 32

I know Rust can optimise enum sizes using "niches" - basically it puts the discriminant in some unused space, or even unused values of the variants. But I don't really understand how it is doing that for Vec<>.

There are definitely configurations of Vec<> that are invalid and thus could be used for the discriminant, e.g. size > capacity, null data pointer && size > 0, etc. But is Rust really smart enough to figure this out, or is it a hand-coded niche?

Upvotes: 8

Views: 482

Answers (1)

user4815162342
user4815162342

Reputation: 155246

The secret is that ArrayVec<[u8; 14]> is smaller than Vec<u8> and takes 16 bytes:

pub struct ArrayVec<A> {
  len: u16,
  pub(crate) data: A, // A = [u8; 14]
}

Compared to Vec's 24 bytes, this leaves 8 bytes that TinyVec<[u8; 14]> is not using. These bytes being zero can represent the Inline variant, and them being non-zero can represent the Heap variant. In other words, the compiler is smart enough to use the data pointer part of Vec, which is NonNull<T>, as the niche.

Thus TinyVec is either:

struct TinyVec<A> {
    // ArrayVec (first 8 bytes are zero):
    this_is_null: *mut [u8; 14],
    len: u16,
    data: [u8; 14],
}

or:

struct TinyVec<A> {
    // Vec (first 8 bytes are non-zero and point to data):
    data: *mut [u8; 14],
    len: usize,
    capacity: usize,
}

Value of the first field serves as the discriminator between the two possibilities.

This can be observed with this unsafe code:

type TinyVec = tinyvec::TinyVec<[u8; 14]>;

fn main() {
    assert_eq!(std::mem::size_of::<TinyVec>(), 24);
    let a: TinyVec = (0..10).collect();
    let b: TinyVec = (0..20).collect();
    unsafe {
        // XXX this makes assumptions about the layout of TinyVec and Vec
        // not guaranteed by rustc
        println!("{:?}", std::mem::transmute::<_, &(usize, u16, [u8; 14])>(&a));
        println!("{:?}", std::mem::transmute::<_, &(usize, usize, usize)>(&b));
    }
}

Playground

It prints something like:

(0, 10, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 0, 0])
(94630321433040, 20, 20)

The first line shows inline representation: null niche, length 10, and the array of values. The second line shows the heap representation: non-null niche (data pointer pointing to the heap) followed by length and capacity.

Upvotes: 7

Related Questions