How is TinyVec the same size as Vec?

Question

TinyVec is defined as:

pub enum TinyVec {
  #[allow(missing_docs)]
  Inline(ArrayVec),
  #[allow(missing_docs)]
  Heap(Vec),
}

However if you run this code you get interesting results (playground):

use tinyvec::*;

fn main() {
    dbg!(std::mem::size_of::>());
    
    dbg!(std::mem::size_of::>());
    dbg!(std::mem::size_of::>());
    dbg!(std::mem::size_of::>());
    dbg!(std::mem::size_of::>());
}

Output:

[src/main.rs:4] std::mem::size_of::>() = 24
[src/main.rs:6] std::mem::size_of::>() = 24
[src/main.rs:7] std::mem::size_of::>() = 24
[src/main.rs:8] std::mem::size_of::>() = 32
[src/main.rs:9] std::mem::size_of::>() = 32

I know Rust can optimise enum sizes using "niches" - basically it puts the discriminant in some unused space, or even unused values of the variants. But I don't really understand how it is doing that for Vec<>.

There are definitely configurations of Vec<> that are invalid and thus could be used for the discriminant, e.g. size > capacity, null data pointer && size > 0, etc. But is Rust really smart enough to figure this out, or is it a hand-coded niche?

user4815162342 · Accepted Answer

The secret is that ArrayVec<[u8; 14]> is smaller than Vec and takes 16 bytes:

pub struct ArrayVec {
  len: u16,
  pub(crate) data: A, // A = [u8; 14]
}

Compared to Vec's 24 bytes, this leaves 8 bytes that TinyVec<[u8; 14]> is not using. These bytes being zero can represent the Inline variant, and them being non-zero can represent the Heap variant. In other words, the compiler is smart enough to use the data pointer part of Vec, which is NonNull, as the niche.

Thus TinyVec is either:

struct TinyVec {
    // ArrayVec (first 8 bytes are zero):
    this_is_null: *mut [u8; 14],
    len: u16,
    data: [u8; 14],
}

or:

struct TinyVec {
    // Vec (first 8 bytes are non-zero and point to data):
    data: *mut [u8; 14],
    len: usize,
    capacity: usize,
}

Value of the first field serves as the discriminator between the two possibilities.

This can be observed with this unsafe code:

type TinyVec = tinyvec::TinyVec<[u8; 14]>;

fn main() {
    assert_eq!(std::mem::size_of::(), 24);
    let a: TinyVec = (0..10).collect();
    let b: TinyVec = (0..20).collect();
    unsafe {
        // XXX this makes assumptions about the layout of TinyVec and Vec
        // not guaranteed by rustc
        println!("{:?}", std::mem::transmute::<_, &(usize, u16, [u8; 14])>(&a));
        println!("{:?}", std::mem::transmute::<_, &(usize, usize, usize)>(&b));
    }
}

Playground

It prints something like:

(0, 10, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 0, 0])
(94630321433040, 20, 20)

The first line shows inline representation: null niche, length 10, and the array of values. The second line shows the heap representation: non-null niche (data pointer pointing to the heap) followed by length and capacity.

How is TinyVec the same size as Vec?

Answers (1)

Related Questions