Wolfgang Kuehn
Wolfgang Kuehn

Reputation: 12926

Idiomatic way to convert ascii array to string in Rust

From a byte array, I want to convert a slice to a string using the ASCII-encoding. The solution

fn main() {
    let buffer: [u8; 9] = [255, 255, 255, 255, 77, 80, 81, 82, 83];
    let s = String::from_iter(buffer[5..9].iter().map(|v| { *v as char }));
    println!("{}", s);
    assert_eq!("PQRS", s);
}

does not seem to be idiomatic, and has a smell of poor performance. Can we do better? Without external crates?

Upvotes: 6

Views: 13153

Answers (2)

Matt Thomas
Matt Thomas

Reputation: 5744

As SirDarius said you can try to use core::str::from_utf8. But you need to understand that not every UTF8 string is an ASCII string. What I mean is: just because a byte array can be interpreted as a UTF8 string, that does not mean it can be interpreted as an ASCII string.

In other words, core::str::from_utf8 will only work if you already know your byte array is truly ASCII.

But in that case it's more efficient to just use core::str::from_utf_unchecked, as the documentation on from_utf8 says:

If you are sure that the byte slice is valid UTF-8, and you don’t want to incur the overhead of the validity check, there is an unsafe version of this function, from_utf8_unchecked, which has the same behavior but skips the check.

Here's an example where you can get a valid string from an invalid ASCII array:

fn main() {
    let buffer = [ 226, 154, 160 ];
    //             ^^^  ^^^  ^^^ None of these are valid ASCII characters
    let str = core::str::from_utf8(&buffer).unwrap(); // Doesn't panic
    println!("{}", str); // Prints "⚠"
}

Run this example yourself

Instead you need to first scan the byte array for invalid ASCII characters.

Solution

fn get_ascii_str<'a>(buffer: &'a [u8]) -> Result<&'a str, ()> {
    for byte in buffer.into_iter() {
        if byte >= &128 {
            return Err(());
        }
    }
    Ok(unsafe {
        // This is safe because we verified above that it's a valid ASCII
        // string, and all ASCII strings are also UTF8 strings
        core::str::from_utf8_unchecked(buffer)
    })
}

Note: this function will work in [no_std] environments.

Example:

fn main() {
    let buffer = [ 226, 154, 160 ]; // UTF8 bytes for "⚠"
    //             ^^^  ^^^  ^^^ None of these are valid ASCII characters
    assert_eq!(Err(()), get_ascii_str(&buffer)); // Correctly fails to interpret as ASCII
    let buffer = [
        'H' as u8,
        'e' as u8,
        'l' as u8,
        'l' as u8,
        'o' as u8,
        ',' as u8,
        ' ' as u8,
        'w' as u8,
        'o' as u8,
        'r' as u8,
        'l' as u8,
        'd' as u8,
        '!' as u8,
    ];
    let str = get_ascii_str(&buffer).unwrap();
    println!("{}", str); // Prints "Hello, world!"
}

fn get_ascii_str<'a>(buffer: &'a [u8]) -> Result<&'a str, ()> {
    // See implementation above
}

Run this example yourself

Upvotes: 1

SirDarius
SirDarius

Reputation: 42899

A Rust string can be directly created from a UTF-8 encoded byte buffer like so:

fn main() {
    let buffer: [u8; 9] = [255, 255, 255, 255, 77, 80, 81, 82, 83];
    let s = std::str::from_utf8(&buffer[5..9]).expect("invalid utf-8 sequence");
    println!("{}", s);
    assert_eq!("PQRS", s);
}

The operation can fail if the input buffer contains an invalid UTF-8 sequence, however ASCII characters are valid UTF-8 so it works in this case.

Note that here, the type of s is &str, meaning that it is a reference to buffer. No allocation takes place here, so the operation is very efficient.

See it in action: Playground link

Upvotes: 5

Related Questions