Reputation: 12926
From a byte array, I want to convert a slice to a string using the ASCII-encoding. The solution
fn main() {
let buffer: [u8; 9] = [255, 255, 255, 255, 77, 80, 81, 82, 83];
let s = String::from_iter(buffer[5..9].iter().map(|v| { *v as char }));
println!("{}", s);
assert_eq!("PQRS", s);
}
does not seem to be idiomatic, and has a smell of poor performance. Can we do better? Without external crates?
Upvotes: 6
Views: 13153
Reputation: 5744
As SirDarius said you can try to use core::str::from_utf8
. But you need to understand that not every UTF8 string is an ASCII string. What I mean is: just because a byte array can be interpreted as a UTF8 string, that does not mean it can be interpreted as an ASCII string.
In other words, core::str::from_utf8
will only work if you already know your byte array is truly ASCII.
But in that case it's more efficient to just use core::str::from_utf_unchecked
, as the documentation on from_utf8
says:
If you are sure that the byte slice is valid UTF-8, and you don’t want to incur the overhead of the validity check, there is an unsafe version of this function, from_utf8_unchecked, which has the same behavior but skips the check.
Here's an example where you can get a valid string from an invalid ASCII array:
fn main() {
let buffer = [ 226, 154, 160 ];
// ^^^ ^^^ ^^^ None of these are valid ASCII characters
let str = core::str::from_utf8(&buffer).unwrap(); // Doesn't panic
println!("{}", str); // Prints "⚠"
}
Instead you need to first scan the byte array for invalid ASCII characters.
fn get_ascii_str<'a>(buffer: &'a [u8]) -> Result<&'a str, ()> {
for byte in buffer.into_iter() {
if byte >= &128 {
return Err(());
}
}
Ok(unsafe {
// This is safe because we verified above that it's a valid ASCII
// string, and all ASCII strings are also UTF8 strings
core::str::from_utf8_unchecked(buffer)
})
}
Note: this function will work in [no_std]
environments.
Example:
fn main() {
let buffer = [ 226, 154, 160 ]; // UTF8 bytes for "⚠"
// ^^^ ^^^ ^^^ None of these are valid ASCII characters
assert_eq!(Err(()), get_ascii_str(&buffer)); // Correctly fails to interpret as ASCII
let buffer = [
'H' as u8,
'e' as u8,
'l' as u8,
'l' as u8,
'o' as u8,
',' as u8,
' ' as u8,
'w' as u8,
'o' as u8,
'r' as u8,
'l' as u8,
'd' as u8,
'!' as u8,
];
let str = get_ascii_str(&buffer).unwrap();
println!("{}", str); // Prints "Hello, world!"
}
fn get_ascii_str<'a>(buffer: &'a [u8]) -> Result<&'a str, ()> {
// See implementation above
}
Upvotes: 1
Reputation: 42899
A Rust string can be directly created from a UTF-8 encoded byte buffer like so:
fn main() {
let buffer: [u8; 9] = [255, 255, 255, 255, 77, 80, 81, 82, 83];
let s = std::str::from_utf8(&buffer[5..9]).expect("invalid utf-8 sequence");
println!("{}", s);
assert_eq!("PQRS", s);
}
The operation can fail if the input buffer contains an invalid UTF-8 sequence, however ASCII characters are valid UTF-8 so it works in this case.
Note that here, the type of s
is &str
, meaning that it is a reference to buffer
. No allocation takes place here, so the operation is very efficient.
See it in action: Playground link
Upvotes: 5