ccleve
ccleve

Reputation: 15799

Convert bytes to u64

I need to convert the first 8 bytes of a String in Rust to a u64, big endian. This code almost works:

fn main() {
    let s = String::from("01234567");
    let mut buf = [0u8; 8];
    buf.copy_from_slice(s.as_bytes());
    let num = u64::from_be_bytes(buf);
    println!("{:X}", num);
}

There are multiple problems with this code. First, it only works if the string is exactly 8 bytes long. .copy_from_slice() requires that both source and destination have the same length. This is easy to deal with if the String is too long because I can just grab a slice of the right length, but if the String is short it won't work.

Another problem is that this code is part of a function which is very performance sensitive. It runs in a tight loop over a large data set.

In C, I would just zero the buf, memcpy over the right number of bytes, and do a cast to an unsigned long.

Is there some way to do this in Rust which runs just as fast?

Upvotes: 2

Views: 4751

Answers (1)

Kevin Reid
Kevin Reid

Reputation: 43773

You can just modify your existing code to take the length into account when copying:

    let len = 8.min(s.len());
    buf[..len].copy_from_slice(&s.as_bytes()[..len]);

If the string is short this will copy the bytes into what will become the most significant bits of the u64, of course.

As to performance: in this simple test main(), the conversions are completely optimized out to become a constant integer. So, we need an explicit function or loop:

pub fn convert(s: &str) -> u64 {
    let mut buf = [0u8; 8];
    let len = 8.min(s.len());
    buf[..len].copy_from_slice(&s.as_bytes()[..len]);
    u64::from_be_bytes(buf)
}

This (on the Rust Playground) generates the assembly:

playground::convert:
    pushq   %rax
    movq    %rdi, %rax
    movq    $0, (%rsp)
    cmpq    $8, %rsi
    movl    $8, %edx
    cmovbq  %rsi, %rdx
    movq    %rsp, %rdi
    movq    %rax, %rsi
    callq   *memcpy@GOTPCREL(%rip)
    movq    (%rsp), %rax
    bswapq  %rax
    popq    %rcx
    retq

I feel a little skeptical that that memcpy call is actually a good idea compared to just issuing instructions to copy the bytes, but I'm no expert on instruction-level performance and presumably it'll at least equal your C code explicitly calling memcpy(). What we do see is that there are no branches in the compiled code, only a conditional move presumably to handle the 8 vs. len() choice — and no bounds-check panic.

(And the generated assembly will of course be different — hopefully for the better — when this function or code snippet is inlined into a larger loop.)

Upvotes: 3

Related Questions