Reputation: 6781
How do I split a string in Rust such that contiguous delimiters are collapsed into one? For example:
"1 2 3".splitX(" ")
should yield this Vec
: ["1", "2", "3"]
(when collected from the Split object, or any other intermediate object there may be). This example is for whitespace but we should be able to extend this for other delimiters too.
I believe we can use .filter()
to remove empty items after using .split()
, but it would be cleaner if it could be done as part of the original .split()
directly. I obviously searched this thoroughly and am surprised I can't find the answer anywhere.
I know for whitespace we already have split_whitespace()
and split_ascii_whitespace()
, but I am looking for a solution that works for a general delimiter string.
Upvotes: 3
Views: 1353
Reputation: 243
As stated by others, split
and filter
or with regex
is better here. But there is one pattern which can be used flat_map
. Though in this context it doesn't add much value.
fn main() {
let output: Vec<&str> = "1 2 3"
.split(" ")
.flat_map(|x| if !x.is_empty() { Some(x) } else { None })
.collect();
println!("{:#?}", output)
}
You can use this pattern, say, if you want to parse these strings as numbers and ignore error values.
fn main() {
let output: Vec<i32> = "1 2 3"
.split(" ")
.flat_map(|x| x.parse())
.collect();
println!("{:#?}", output)
}
All flat_map
cares is closure to return something which implements IntoIterator
Upvotes: 1
Reputation: 382464
The standard solution is to use split
then filter
:
let output: Vec<&str> = input
.split(pattern)
.filter(|s| !s.is_empty())
.collect();
This is fast and clear.
You can also use a regular expression to avoid the filter step:
let output: Vec<&str> = regex::Regex::new(" +").unwrap()
.split(input)
.collect();
If it's in a function which will be called several times, you can avoid repeating the Regex compilation with lazy_regex:
let output: Vec<&str> = lazy_regex::regex!(" +")
.split(input)
.collect();
Upvotes: 8
Reputation: 10216
IMO, by far the cleanest way is to write .split(" ").filter(|s| !s.is_empty())
. It works for all separators and the intent is obvious from reading the code.
If that's too "ugly", you could perhaps pull it into a trait:
trait SplitNonEmpty {
// you might want to define your own struct for the return type
fn split_non_empty<'a, P>(&self, p: P) where P: Pattern<'a> -> ...;
}
impl SplitNonEmpty for &str {
// ...
}
If it's very important that this function returns a Split
, you might need to refactor your code to use traits more; do you really care that it was created by splitting a string, or do you care that you can iterate over it? If so, maybe that function should take a impl IntoIterator<&'a str>
?
Upvotes: 1