ElementalX
ElementalX

Reputation: 171

What is the purpose of `b` here?

In this code:

fn main() {
    use std::collections::hash_map::DefaultHasher;
    use std::hash::Hasher;
    
    let mut hasher = DefaultHasher::new();
    
    hasher.write_u32(1989);
    hasher.write_u8(11);
    hasher.write_u8(9);
    hasher.write(b"Huh?"); // <--------
    
    println!("Hash is {:x}!", hasher.finish());
}

What's the point of b? What does it do?

Upvotes: 15

Views: 12165

Answers (2)

kbolino
kbolino

Reputation: 1754

"some string":

  • is a &str which can be converted easily to &[u8] as UTF-8
  • must result in a sequence of bytes which is valid UTF-8
  • can contain any ASCII or Unicode character
  • can use ASCII escape sequences (\x00 to \x7F) and Unicode escape sequences (\u0 to \u10FFFF)

b"some string":

  • is a &[u8; N] (*) which must be validated as UTF-8 to safely convert to &str
  • may result in any arbitrary sequence of bytes
  • can contain only ASCII characters
  • can use full byte escape sequences (\x00 to \xFF)

For example, b"\xFF" is valid while "\xFF" is not, because the byte FF in hex (255 in decimal) is not allowed anywhere in UTF-8. Similarly, "😊" is valid while b"😊" is not, because emoji are part of Unicode and not ASCII.

As an interesting side note, "\uFF" (which is the same as "ΓΏ") does not convert to b"\xFF" but rather to b"\xC3\xBF" because UTF-8 uses multiple bytes to encode characters outside of ASCII.

(*) = &[u8; N] (array) is similar to &[u8] (slice), but it also encodes the length N as part of the type (e.g. N is 11 for b"some string"). The distinction often doesn't matter, as the former coerces to the latter. More details on the distinction can be found in an answer to a different question.

Upvotes: 11

Arijit Dey
Arijit Dey

Reputation: 406

Any string prefixed by a b tells the compiler that the string should be treated as a byte sequence. This is called a byte string literal.

You can read more about it in the The Rust Reference. In short, a string in Rust is a valid sequence of unicode characters and hence it can be represented as &[u8] (A slice containing unsigned 8-bit integers). A byte is also a 8 bit-integer so it is considered as a sequence of unicode bytes.

The hasher.write(...) function takes a &[u8], basically a sequence of bytes as parameter. In order to convert your &str to bytes, you prefix it with a b

Upvotes: 18

Related Questions