Reputation: 550
The bytes are unsigned and are all less than 16 so they can be fit into a nibble.
I'm currently shifting the bytes in a loop and &
them with 0xf
:
pub fn compress(offsets: [u8; 8]) -> u32 {
let mut co: u32 = 0;
for (i, o) in offsets.iter().enumerate() {
co |= ((*o as u32) & 0xf ) << (i * 4);
}
co
}
The compiler does already some good optimization on that:
But maybe it is possible to do some bit twiddling or use SIMD commands with a u64
to reduce the amount of operations?
Upvotes: 2
Views: 533
Reputation: 64904
With the bitintr
crate you can use pext
:
bitintr::bmi2::pext(x, 0x0f0f0f0f0f0f0f0f)
However, that is only fast on Intel processors. AMD Ryzen implements BMI2, but its pext
is very slow.
Here is an alternative with only normal code:
pub fn compress(offsets: [u8; 8]) -> u32 {
let mut x = u64::from_le_bytes(offsets);
x = (x | (x >> 4)) & 0x00FF00FF00FF00FF;
x = (x | (x >> 8)) & 0x0000FFFF0000FFFF;
x = (x | (x >> 16));
x as u32
}
The steps do this:
start: 0x0a0b0c0d0e0f0g0h
x | (x >> 4): 0x0aabbccddeeffggh
& mask: 0x00ab00cd00ef00gh
x | (x >> 8): 0x00ababcdcdefefgh
& mask: 0x0000abcd0000efgh
x | (x >> 16): 0x0000abcdabcdefgh
as u32: 0xabcdefgh
Upvotes: 4