Peter Prographo
Peter Prographo

Reputation: 1331

How many bytes does a char take up when stored in a string slice?

I'm getting confused about strings/chars/slices in Rust. According to the documentation, a character is 4 bytes, however the program below shows a string of three characters uses 7 bytes in the slice. It seems like in the slice the characters are stored as efficiently as possible, so the regular 'A' is stored as 1 bytes, and the two Kanji characters are stored as 3 bytes each.

fn main() {
    let s = String::from("A漢字");
    let ss = &s[..];
    let sbytes = ss.len();
    let schars = s.chars().count();
    println!("{} is {} characters and {} bytes",ss,schars,sbytes);
}
$ cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/test_string`
A漢字 is 3 characters and 7 bytes

Upvotes: 0

Views: 1183

Answers (2)

Peter Prographo
Peter Prographo

Reputation: 1331

I found that for a character c, you can know the bytes it will take up in the slice like this:

let b = c.len_utf8();

Upvotes: 2

Kuznero
Kuznero

Reputation: 141

Rust uses the UTF-8 encoding for strings. So, a String represents its text as a sequence of UTF-8 bytes, not as an array of characters. Here is a little demonstration:

assert_eq!("ಠ".len(), 3);
assert_eq!("ಠ".chars().count(), 1);

Upvotes: 2

Related Questions