Reputation: 1331
I'm getting confused about strings/chars/slices in Rust. According to the documentation, a character is 4 bytes, however the program below shows a string of three characters uses 7 bytes in the slice. It seems like in the slice the characters are stored as efficiently as possible, so the regular 'A' is stored as 1 bytes, and the two Kanji characters are stored as 3 bytes each.
fn main() {
let s = String::from("A漢字");
let ss = &s[..];
let sbytes = ss.len();
let schars = s.chars().count();
println!("{} is {} characters and {} bytes",ss,schars,sbytes);
}
$ cargo run
Finished dev [unoptimized + debuginfo] target(s) in 0.00s
Running `target/debug/test_string`
A漢字 is 3 characters and 7 bytes
Upvotes: 0
Views: 1183
Reputation: 1331
I found that for a character c, you can know the bytes it will take up in the slice like this:
let b = c.len_utf8();
Upvotes: 2
Reputation: 141
Rust uses the UTF-8 encoding for strings. So, a String
represents its text as a sequence of UTF-8 bytes, not as an array of characters. Here is a little demonstration:
assert_eq!("ಠ".len(), 3);
assert_eq!("ಠ".chars().count(), 1);
Upvotes: 2