The best way to enumerate through a string in Rust? (chars() vs as_bytes())

Question

I'm new to Rust, and I'm learning it using Rust Book.

Recently, I found this function there:

// Returns the number of characters in the first
// word of the given string

fn first_word(s: &String) -> usize {
    let bytes = s.as_bytes();

    for (i, &item) in bytes.iter().enumerate() {
        if item == b' ' {
            return i;
        }
    }

    s.len()
}

As you see, the authors were using String::as_bytes() method here to enumerate through a string. Then, they were casting the char ' ' to u8 type to check whether we've reached the end of the first word.

As I know, ther is another option, which looks much better:

fn first_word(s: &String) -> usize {
    for (i, item) in s.chars().enumerate() {
        if item == ' ' {
            return i;
        }
    }
    s.len()
}

Here, I'm using String::chars() method, and the function looks much cleaner.

So the question is: is there any difference between these two things? If so, which one is better and why?

effect · Accepted Answer

If your string happens to be purely ASCII (where there is only one byte per character), the two functions should behave identically.

However, Rust was designed to support UTF8 strings, where a single character could be composed of multiple bytes, therefore using s.chars() should be preferred, it will allow your function to still work as expected if you have non-ascii characters in your string.

As @eggyal points out, Rust has a str::split_whitespace method which returns an iterator over words, and this method will split all whitespace (instead of just spaces). You could use it like so:

fn first_word(s: &String) -> usize {
    if let Some(word) = s.split_whitespace().next() {
        word.len()
    }
    else {
       s.len() 
    }
}

The best way to enumerate through a string in Rust? (chars() vs as_bytes())

Answers (1)

Related Questions