Reputation: 3216
I have created three enums that are nearly identical:
#[derive(Clone, Debug)]
pub enum Smoller {
Int(u8),
Four([u8; 4]),
Eight([u8; 8]),
Twelve([u8; 12]),
Sixteen([u8; 16]),
}
#[derive(Clone, Debug)]
pub enum Smol {
Float(f32),
Four([u8; 4]),
Eight([u8; 8]),
Twelve([u8; 12]),
Sixteen([u8; 16]),
}
#[derive(Clone, Debug)]
pub enum Big {
Float(f64),
Four([u8; 4]),
Eight([u8; 8]),
Twelve([u8; 12]),
Sixteen([u8; 16]),
}
pub fn main() {
println!("Smoller: {}", std::mem::size_of::<Smoller>()); // => Smoller: 17
println!("Smol: {}", std::mem::size_of::<Smol>()); // => Smol: 20
println!("Big: {}", std::mem::size_of::<Big>()); // => Big: 24
}
What I expect, given my understanding of computers and memory, is that these should be the same size. The biggest variant is the [u8; 16]
with a size of 16. Therefore, while these enums do have a different size first variant, they have the same size of their biggest variants and the same number of variants total.
I know that Rust can do some optimizations to acknowledge when some types have gaps (e.g. pointers can collapse because we know that they won't be valid and 0), but this is really the opposite of that. I think if I were constructing this enum by hand, I could fit it into 17 bytes (only one byte being necessary for the discrimination), so both the 20 bytes and the 24 bytes are perplexing to me.
I suspect this might have something to do with alignment, but I don't know why and I don't know why it would be necessary.
Can someone explain this?
Thanks!
Upvotes: 4
Views: 1112
Reputation: 299999
As mcarton mentions, this is an effect of alignment of internal fields and alignment/size rules.
Specifically, common alignments for built-in types are:
Do note that I say common, in practice alignment is dictated by hardware, and on 32-bits architectures you could reasonably expect f64 to be 4-bytes aligned. Further, the alignment of isize
, usize
and pointers will vary based on 32-bits vs 64-bits architecture.
In general, for ease of use, the alignment of a compound type is the largest alignment of any of its fields, recursively.
Access to unaligned values is generally architecture specific; on some architecture it will crash (SIGBUS) or return erroneous data, on some it will be slower (x86/x64 not so long ago) and on others it may be just fine (newer x64, on some instructions).
In C, the size must always be a multiple of the alignment, because of the way arrays are laid out and iterated over:
sizeof(T)
bytes.Rust has inherited this behavior^1 .
It's interesting to note that Swift decided to define a separate intrinsic, strideof
, to represent the stride in an array, which allowed them to remove any tail-padding from the result of sizeof
. It did cause some confusions, as people expected sizeof
to behave like C, but allows compacting memory more efficiently.
Thus, in Swift, your enums could be represented as:
Smoller
: [u8 x 16][discriminant]
=> sizeof 17 bytes, strideof 17 bytes, alignof 1 byte.Smol
: [u8 x 16][discriminant]
=> sizeof 17 bytes, strideof 20 bytes, alignof 4 bytes.Big
: [u8 x 16][discriminant]
=> sizeof 17 bytes, strideof 24 bytes, alignof 8 bytes.Which clearly shows the difference between the size and the stride, which are conflated in C and Rust.
^1 I seem to remember some discussions over the possible switch to strideof
, which did not come to fruition as we can see, but could not find a link to them.
Upvotes: 4
Reputation: 30061
The size must be at least 17 bytes, because its biggest variant is 16 bytes big, and it needs an extra byte for the discriminant (the compiler can be smart in some cases, and put the discriminant in unused bits of the variants, but it can't do this here).
Also, the size of Big
must be a multiple of 8 bytes to align f64
properly. The smaller multiple of 8 bigger than 17 is 24.
Similarly, Smol
cannot be only 17 bytes, because its size must be a multiple of 4 bytes (the size of f32
). Smoller
only contains u8
so it can be aligned to 1 byte.
Upvotes: 9
Reputation: 98436
I think that it is because of the alignment requirements of the inner values.
u8
has an alignment of 1
, so all works as you expect, and you get a whole size of 17 bytes.
But f32
has an alignment of 4
(technically it is arch-dependent, but that is the most likely value). So even if the discriminant is just 1 byte you get this layout for Smol::Float
:
[discriminant x 1] [padding x 3] [f32 x 4] = 8 bytes
And then for Smol::Sixteen
:
[discriminant x 1] [u8 x 16] [padding x 3] = 20 bytes
Why is this padding really necessary? Because it is a requirement that the size of a type must be a multiple of the alignment, or else arrays of this type will misalign.
Similarly, the alignment for f64
is 8, so you get a full size of 24, that is the smallest multiple of 8 that fits all the enums.
Upvotes: 4