Reputation: 60003
I looked at the Rust docs for String
but I can't find a way to extract a substring.
Is there a method like JavaScript's substr
in Rust? If not, how would you implement it?
str.substr(start[, length])
The closest is probably slice_unchecked
but it uses byte offsets instead of character indexes and is marked unsafe
.
Upvotes: 53
Views: 32814
Reputation: 51
The solution given by oli_obk does not handle correctly the last index of the slice.
Here substr
implements rust-styled slicing while taking corner cases into account.
pub fn substr(s: &str, begin: usize, end: Option<usize>) -> Option<&str> {
use std::iter::once;
let mut itr = s.char_indices().map(|(n, _)| n).chain(once(s.len()));
let begin_byte = itr.nth(begin)?;
let end_byte = match end {
Some(end) if begin >= end => begin_byte,
Some(end) => itr.nth(end-begin-1)?,
None => s.len(),
};
Some(&s[begin_byte..end_byte])
}
// Tests
let s = "abc🙂";
assert_eq!(Some("bc"), substr(s, 1, Some(3)));
assert_eq!(Some("c🙂"), substr(s, 2, Some(4)));
assert_eq!(Some("c🙂"), substr(s, 2, None));
assert_eq!(Some(""), substr(s, 2, Some(2)));
assert_eq!(Some(""), substr(s, 2, Some(1)));
assert_eq!(None, substr(s, 2, Some(5)));
Note that this does not still handle unicode grapheme clusters. For example, "y̆es"
contains 4 unicode characters but 3 grapheme clusters. This can be solved with crate unicode-segmentation by replacing iterator .char_indices()
with .grapheme_indices()
.
Upvotes: 1
Reputation: 2443
I couldn't find the exact substr
implementation that I'm familiar with from other programming languages like: JavaScript, Dart, and etc.
Here is possible implementation of method substr
to &str
and String
Let's define a trait for making able to implement functions to default types, (like extensions
in Dart).
trait Substr {
fn substr(&self, start: usize, end: usize) -> String;
}
Then implement this trait for &str
impl<'a> Substr for &'a str {
fn substr(&self, start: usize, end: usize) -> String {
if start > end || start == end {
return String::new();
}
self.chars().skip(start).take(end - start).collect()
}
}
Try:
fn main() {
let string = "Hello, world!";
let substring = string.substr(0, 4);
println!("{}", substring); // Hell
}
Upvotes: -1
Reputation: 503
Knowing about the various syntaxes of the slice type might be beneficial for some of the readers.
&s[6..11]
&s[0..1]
^= &s[..1]
&s[3..s.len()]
^= &s[3..]
&s[..]
&s[..=1]
Link to docs: https://doc.rust-lang.org/book/ch04-03-slices.html
Upvotes: 1
Reputation: 1
I'm not very experienced in Rust but I gave it a try. If someone could correct my answer please don't hesitate.
fn substring(string:String, start:u32, end:u32) -> String {
let mut substr = String::new();
let mut i = start;
while i < end + 1 {
substr.push_str(&*(string.chars().nth(i as usize).unwrap().to_string()));
i += 1;
}
return substr;
}
Here is a playground
Upvotes: -1
Reputation: 48087
For characters, you can use s.chars().skip(pos).take(len)
:
fn main() {
let s = "Hello, world!";
let ss: String = s.chars().skip(7).take(5).collect();
println!("{}", ss);
}
Beware of the definition of Unicode characters though.
For bytes, you can use the slice syntax:
fn main() {
let s = b"Hello, world!";
let ss = &s[7..12];
println!("{:?}", ss);
}
Upvotes: 84
Reputation: 14030
You can also use .to_string()[ <range> ]
.
This example takes an immutable slice of the original string, then mutates that string to demonstrate the original slice is preserved.
let mut s: String = "Hello, world!".to_string();
let substring: &str = &s.to_string()[..6];
s.replace_range(..6, "Goodbye,");
println!("{} {} universe!", s, substring);
// Goodbye, world! Hello, universe!
Upvotes: -2
Reputation: 199
This code performs both substring-ing and string-slicing, without panicking nor allocating:
use std::ops::{Bound, RangeBounds};
trait StringUtils {
fn substring(&self, start: usize, len: usize) -> &str;
fn slice(&self, range: impl RangeBounds<usize>) -> &str;
}
impl StringUtils for str {
fn substring(&self, start: usize, len: usize) -> &str {
let mut char_pos = 0;
let mut byte_start = 0;
let mut it = self.chars();
loop {
if char_pos == start { break; }
if let Some(c) = it.next() {
char_pos += 1;
byte_start += c.len_utf8();
}
else { break; }
}
char_pos = 0;
let mut byte_end = byte_start;
loop {
if char_pos == len { break; }
if let Some(c) = it.next() {
char_pos += 1;
byte_end += c.len_utf8();
}
else { break; }
}
&self[byte_start..byte_end]
}
fn slice(&self, range: impl RangeBounds<usize>) -> &str {
let start = match range.start_bound() {
Bound::Included(bound) | Bound::Excluded(bound) => *bound,
Bound::Unbounded => 0,
};
let len = match range.end_bound() {
Bound::Included(bound) => *bound + 1,
Bound::Excluded(bound) => *bound,
Bound::Unbounded => self.len(),
} - start;
self.substring(start, len)
}
}
fn main() {
let s = "abcdèfghij";
// All three statements should print:
// "abcdè, abcdèfghij, dèfgh, dèfghij."
println!("{}, {}, {}, {}.",
s.substring(0, 5),
s.substring(0, 50),
s.substring(3, 5),
s.substring(3, 50));
println!("{}, {}, {}, {}.",
s.slice(..5),
s.slice(..50),
s.slice(3..8),
s.slice(3..));
println!("{}, {}, {}, {}.",
s.slice(..=4),
s.slice(..=49),
s.slice(3..=7),
s.slice(3..));
}
Upvotes: 8
Reputation: 214
For my_string.substring(start, len)
-like syntax, you can write a custom trait:
trait StringUtils {
fn substring(&self, start: usize, len: usize) -> Self;
}
impl StringUtils for String {
fn substring(&self, start: usize, len: usize) -> Self {
self.chars().skip(start).take(len).collect()
}
}
// Usage:
fn main() {
let phrase: String = "this is a string".to_string();
println!("{}", phrase.substring(5, 8)); // prints "is a str"
}
Upvotes: 8
Reputation: 31223
You can use the as_str
method on the Chars
iterator to get back a &str
slice after you have stepped on the iterator. So to skip the first start
chars, you can call
let s = "Some text to slice into";
let mut iter = s.chars();
iter.by_ref().nth(start); // eat up start values
let slice = iter.as_str(); // get back a slice of the rest of the iterator
Now if you also want to limit the length, you first need to figure out the byte-position of the length
character:
let end_pos = slice.char_indices().nth(length).map(|(n, _)| n).unwrap_or(0);
let substr = &slice[..end_pos];
This might feel a little roundabout, but Rust is not hiding anything from you that might take up CPU cycles. That said, I wonder why there's no crate yet that offers a substr
method.
Upvotes: 18