River Tam
River Tam

Reputation: 3216

Why does making one enum variant an `f64` increase the size of this enum?

I have created three enums that are nearly identical:

#[derive(Clone, Debug)]
pub enum Smoller {
    Int(u8),
    Four([u8; 4]),
    Eight([u8; 8]),
    Twelve([u8; 12]),
    Sixteen([u8; 16]),
}

#[derive(Clone, Debug)]
pub enum Smol {
    Float(f32),
    Four([u8; 4]),
    Eight([u8; 8]),
    Twelve([u8; 12]),
    Sixteen([u8; 16]),
}

#[derive(Clone, Debug)]
pub enum Big {
    Float(f64),
    Four([u8; 4]),
    Eight([u8; 8]),
    Twelve([u8; 12]),
    Sixteen([u8; 16]),
}

pub fn main() {
    println!("Smoller: {}", std::mem::size_of::<Smoller>()); // => Smoller: 17
    println!("Smol: {}", std::mem::size_of::<Smol>()); // => Smol: 20
    println!("Big: {}", std::mem::size_of::<Big>()); // => Big: 24
}

What I expect, given my understanding of computers and memory, is that these should be the same size. The biggest variant is the [u8; 16] with a size of 16. Therefore, while these enums do have a different size first variant, they have the same size of their biggest variants and the same number of variants total.

I know that Rust can do some optimizations to acknowledge when some types have gaps (e.g. pointers can collapse because we know that they won't be valid and 0), but this is really the opposite of that. I think if I were constructing this enum by hand, I could fit it into 17 bytes (only one byte being necessary for the discrimination), so both the 20 bytes and the 24 bytes are perplexing to me.

I suspect this might have something to do with alignment, but I don't know why and I don't know why it would be necessary.

Can someone explain this?

Thanks!

Upvotes: 4

Views: 1112

Answers (3)

Matthieu M.
Matthieu M.

Reputation: 299999

As mcarton mentions, this is an effect of alignment of internal fields and alignment/size rules.


Alignment

Specifically, common alignments for built-in types are:

  • 1: i8, u8.
  • 2: i16, u16.
  • 4: i32, u32, f32.
  • 8: i64, u64, f64.

Do note that I say common, in practice alignment is dictated by hardware, and on 32-bits architectures you could reasonably expect f64 to be 4-bytes aligned. Further, the alignment of isize, usize and pointers will vary based on 32-bits vs 64-bits architecture.

In general, for ease of use, the alignment of a compound type is the largest alignment of any of its fields, recursively.

Access to unaligned values is generally architecture specific; on some architecture it will crash (SIGBUS) or return erroneous data, on some it will be slower (x86/x64 not so long ago) and on others it may be just fine (newer x64, on some instructions).


Size and Alignment

In C, the size must always be a multiple of the alignment, because of the way arrays are laid out and iterated over:

  • Each element in the array must be at its correct alignment.
  • Iterating is done by incrementing the pointer by sizeof(T) bytes.
  • Thus the size must be a multiple of the alignment.

Rust has inherited this behavior^1 .

It's interesting to note that Swift decided to define a separate intrinsic, strideof, to represent the stride in an array, which allowed them to remove any tail-padding from the result of sizeof. It did cause some confusions, as people expected sizeof to behave like C, but allows compacting memory more efficiently.

Thus, in Swift, your enums could be represented as:

  • Smoller: [u8 x 16][discriminant] => sizeof 17 bytes, strideof 17 bytes, alignof 1 byte.
  • Smol: [u8 x 16][discriminant] => sizeof 17 bytes, strideof 20 bytes, alignof 4 bytes.
  • Big: [u8 x 16][discriminant] => sizeof 17 bytes, strideof 24 bytes, alignof 8 bytes.

Which clearly shows the difference between the size and the stride, which are conflated in C and Rust.

^1 I seem to remember some discussions over the possible switch to strideof, which did not come to fruition as we can see, but could not find a link to them.

Upvotes: 4

mcarton
mcarton

Reputation: 30061

The size must be at least 17 bytes, because its biggest variant is 16 bytes big, and it needs an extra byte for the discriminant (the compiler can be smart in some cases, and put the discriminant in unused bits of the variants, but it can't do this here).

Also, the size of Big must be a multiple of 8 bytes to align f64 properly. The smaller multiple of 8 bigger than 17 is 24. Similarly, Smol cannot be only 17 bytes, because its size must be a multiple of 4 bytes (the size of f32). Smoller only contains u8 so it can be aligned to 1 byte.

Upvotes: 9

rodrigo
rodrigo

Reputation: 98436

I think that it is because of the alignment requirements of the inner values.

u8 has an alignment of 1, so all works as you expect, and you get a whole size of 17 bytes.

But f32 has an alignment of 4 (technically it is arch-dependent, but that is the most likely value). So even if the discriminant is just 1 byte you get this layout for Smol::Float:

[discriminant x 1] [padding x 3] [f32 x 4] = 8 bytes

And then for Smol::Sixteen:

[discriminant x 1] [u8 x 16] [padding x 3] = 20 bytes

Why is this padding really necessary? Because it is a requirement that the size of a type must be a multiple of the alignment, or else arrays of this type will misalign.

Similarly, the alignment for f64 is 8, so you get a full size of 24, that is the smallest multiple of 8 that fits all the enums.

Upvotes: 4

Related Questions