Reputation: 2452
I am attempting to index a string in Rust, but the compiler throws an error. My code (Project Euler problem 4, playground):
fn is_palindrome(num: u64) -> bool {
let num_string = num.to_string();
let num_length = num_string.len();
for i in 0 .. num_length / 2 {
if num_string[i] != num_string[(num_length - 1) - i] {
return false;
}
}
true
}
The error:
error[E0277]: the trait bound `std::string::String: std::ops::Index<usize>` is not satisfied
--> <anon>:7:12
|
7 | if num_string[i] != num_string[(num_length - 1) - i] {
| ^^^^^^^^^^^^^
|
= note: the type `std::string::String` cannot be indexed by `usize`
Is there a reason why String
can not be indexed? How can I access the data then?
Upvotes: 166
Views: 182684
Reputation: 128111
Yes, indexing into a string is not available in Rust. The reason for this is that Rust strings are encoded in UTF-8 internally, so the concept of indexing itself would be ambiguous, and people would misuse it: byte indexing is fast, but almost always incorrect (when your text contains non-ASCII symbols, byte indexing may leave you inside a character, which is really bad if you need text processing), while char indexing is not free because UTF-8 is a variable-length encoding, so you have to traverse the entire string to find the required code point.
If you are certain that your strings contain ASCII characters only, you can use the as_bytes()
method on &str
which returns a byte slice, and then index into this slice:
let num_string = num.to_string();
// ...
let b: u8 = num_string.as_bytes()[i];
let c: char = b as char; // if you need to get the character as a unicode code point
If you do need to index code points, you have to use the chars()
iterator:
num_string.chars().nth(i).unwrap()
As I said above, this would require traversing the entire iterator up to the i
th code element.
Finally, in many cases of text processing, it is actually necessary to work with grapheme clusters rather than with code points or bytes. With the help of the unicode-segmentation crate, you can index into grapheme clusters as well:
use unicode_segmentation::UnicodeSegmentation
let string: String = ...;
UnicodeSegmentation::graphemes(&string, true).nth(i).unwrap()
Naturally, grapheme cluster indexing has the same requirement of traversing the entire string as indexing into code points.
Upvotes: 211
Reputation: 137
Indexing on strings is possible, just not with single-valued scalars. Range objects work on String
and &str
. A "single-valued" range object (one with length one) is valid as well. Playground link
fn main() {
let str1 = "lorem ipsum";
let string2 = String::from(str1);
println!("{}:{} {}:{}", &str1[..1], &str1[1..5], &string2[6..7], &string2[7..]);
}
Upvotes: 2
Reputation: 49709
There are two reasons indexing is not working in Rust:
In rust, strings are stored as a collection of utf-8
encoded bytes. In memory, strings are just collections of 1's and 0's. a program needs to be able to interpret those 1's and 0's and print out the correct characters. that's where encoding comes into play.
fn main(){
let sample:String=String::from("2bytesPerChar")
// we could this in higher programming languages. in rust we get error. cannot be indexed by an integer
let c:char=sample[0]
}
String is a collection of bytes. so what is the lenght of our "2bytesPerChar". Because some chars can be 1 to 4 bytes long. Assume that first character has 2 bytes. If you want to get the first char in string, using the indexing, hello[0] will specify the first byte which is the only half of the first string.
Bytes
, scalar values
, grapheme clusters
. If we use indexing rust does not know what we will receive. Bytes, scalar value or grapheme clusters. so we have to use more specific methods.Return bytes
for b in "dsfsd".bytes(){
// bytes method returns a collection of bytes and here we are iterating over every byte and printing it out
println!("{}",b)
}
Return scalar values:
// we could iterate over scalar values using char methods
for c in "kjdskj".chars(){
println!("{}",c)
}
In order to keep rust standard library lean, the ability iterate over graphene clusters is not included by default. we need to import a crate
// in cargo.toml
[dependencies]
unicode-segmentation="1.7.1"
then:
use unicode_segmentation::UnicodeSegmentation;
// we pass true to get extended grapheme clusters
for g in "dada"graphemes(true){
println!("{}",g)
}
Upvotes: 1
Reputation: 51
Indexing on String is not allowed because (please check the book):
So if you input doesn't contain diacritics (considered as a separate character) and it's ok to approximate letter with character, you can use chars() iterator and DoubleEndedIterator trait for two pointers approach:
fn is_palindrome(num: u64) -> bool {
let s = num.to_string();
let mut iterator = s.chars();
loop {
let ch = iterator.next();
let ch_end = iterator.next_back();
if ch.is_none() || ch_end.is_none() {
break;
}
if ch.unwrap() != ch_end.unwrap() {
return false
}
}
true
}
Upvotes: 4
Reputation: 1140
The bellow code works fine, not sure about performance and O complexity and hopefully someone can add more information about this solution.
fn is_palindrome(num: u64) -> bool {
let num_string = String::from(num.to_string());
let num_length = num_string.len();
for i in 0..num_length / 2 {
let left = &num_string[i..i + 1];
let right = &num_string[((num_length - 1) - i)..num_length - i];
if left != right {
return false;
}
}
true
}
Upvotes: 1
Reputation: 5318
this is not suitable for all uses by any means, but if you just need to reference the previous character (or, with a little rework, the next character), then it's possible to do so without iterating through the entire str.
the scenario here is that there is a str slice, string, and pattern was found in the slice. i want to know the character immediately before the pattern.
call prev_char like prev_char(string.as_bytes(), pattern_index)
where pattern index is the index of the first byte of pattern in string.
utf-8 encoding is well defined and this works just by backing up until it finds one of the starting bytes (either high order bit 0 or bits 11) and then converting that 1-4 byte [u8] slice to a str.
this code just unwraps it because the pattern was found in a valid utf-8 str to begin with, so no error is possible. if your data has not been validated it might be best to return a result rather than an Option.
enum PrevCharStates {
Start,
InEncoding,
}
fn prev_char(bytes: &[u8], starting_index: usize) -> Option<&str> {
let mut ix = starting_index;
let mut state = PrevCharStates::Start;
while ix > 0 {
ix -= 1;
let byte = bytes[ix];
match state {
PrevCharStates::Start => {
if byte & 0b10000000 == 0 {
return Some(std::str::from_utf8(&bytes[ix..starting_index]).unwrap());
} else if byte & 0b11000000 == 0b10000000 {
state = PrevCharStates::InEncoding;
}
},
PrevCharStates::InEncoding => {
if byte & 0b11000000 == 0b11000000 {
return Some(std::str::from_utf8(&bytes[ix..starting_index]).unwrap());
} else if byte & 0b11000000 != 0b10000000 {
return None;
}
}
}
}
None
}
Upvotes: 1
Reputation: 1103
You can convert a String
or &str
to a vec
of a chars and then index that vec
.
For example:
fn main() {
let s = "Hello world!";
let my_vec: Vec<char> = s.chars().collect();
println!("my_vec[0]: {}", my_vec[0]);
println!("my_vec[1]: {}", my_vec[1]);
}
Here you have a live example
Upvotes: 25
Reputation: 21746
If what you are looking for is something similar to an index, you can use
.chars()
and .nth()
on a string.
.chars()
-> Returns an iterator over the char
s of a string slice.
.nth()
-> Returns the nth element of the iterator, in an Option
Now you can use the above in several ways, for example:
let s: String = String::from("abc");
//If you are sure
println!("{}", s.chars().nth(x).unwrap());
//or if not
println!("{}", s.chars().nth(x).expect("message"));
Upvotes: 39
Reputation: 90882
The correct approach to doing this sort of thing in Rust is not indexing but iteration. The main problem here is that Rust's strings are encoded in UTF-8, a variable-length encoding for Unicode characters. Being variable in length, the memory position of the nth character can't determined without looking at the string. This also means that accessing the nth character has a runtime of O(n)!
In this special case, you can iterate over the bytes, because your string is known to only contain the characters 0–9 (iterating over the characters is the more general solution but is a little less efficient).
Here is some idiomatic code to achieve this (playground):
fn is_palindrome(num: u64) -> bool {
let num_string = num.to_string();
let half = num_string.len() / 2;
num_string.bytes().take(half).eq(num_string.bytes().rev().take(half))
}
We go through the bytes in the string both forwards (num_string.bytes().take(half)
) and backwards (num_string.bytes().rev().take(half)
) simultaneously; the .take(half)
part is there to halve the amount of work done. We then simply compare one iterator to the other one to ensure at each step that the nth and nth last bytes are equivalent; if they are, it returns true; if not, false.
Upvotes: 54