Gaurang Tandon
Gaurang Tandon

Reputation: 6781

Split string in Rust, treating consecutive delimiters as one

How do I split a string in Rust such that contiguous delimiters are collapsed into one? For example:

"1  2 3".splitX(" ")

should yield this Vec: ["1", "2", "3"] (when collected from the Split object, or any other intermediate object there may be). This example is for whitespace but we should be able to extend this for other delimiters too.

I believe we can use .filter() to remove empty items after using .split(), but it would be cleaner if it could be done as part of the original .split() directly. I obviously searched this thoroughly and am surprised I can't find the answer anywhere.

I know for whitespace we already have split_whitespace() and split_ascii_whitespace(), but I am looking for a solution that works for a general delimiter string.

Upvotes: 3

Views: 1353

Answers (3)

manu
manu

Reputation: 243

As stated by others, split and filter or with regex is better here. But there is one pattern which can be used flat_map. Though in this context it doesn't add much value.

fn main() {
    let output: Vec<&str> = "1  2 3"
        .split(" ")
        .flat_map(|x| if !x.is_empty() { Some(x) } else { None })
        .collect();
    println!("{:#?}", output)
}

You can use this pattern, say, if you want to parse these strings as numbers and ignore error values.

fn main() {
    let output: Vec<i32> = "1  2 3"
        .split(" ")
        .flat_map(|x| x.parse())
        .collect();
    println!("{:#?}", output)
}

All flat_map cares is closure to return something which implements IntoIterator

Upvotes: 1

Denys S&#233;guret
Denys S&#233;guret

Reputation: 382464

The standard solution is to use split then filter:

let output: Vec<&str> = input
    .split(pattern)
    .filter(|s| !s.is_empty())
    .collect();

This is fast and clear.

You can also use a regular expression to avoid the filter step:

let output: Vec<&str> = regex::Regex::new(" +").unwrap()
    .split(input)
    .collect();

If it's in a function which will be called several times, you can avoid repeating the Regex compilation with lazy_regex:

let output: Vec<&str> = lazy_regex::regex!(" +")
    .split(input)
    .collect();

Upvotes: 8

cameron1024
cameron1024

Reputation: 10216

IMO, by far the cleanest way is to write .split(" ").filter(|s| !s.is_empty()). It works for all separators and the intent is obvious from reading the code.

If that's too "ugly", you could perhaps pull it into a trait:

trait SplitNonEmpty {
  // you might want to define your own struct for the return type
  fn split_non_empty<'a, P>(&self, p: P) where P: Pattern<'a> -> ...;
}

impl SplitNonEmpty for &str {
  // ...
}

If it's very important that this function returns a Split, you might need to refactor your code to use traits more; do you really care that it was created by splitting a string, or do you care that you can iterate over it? If so, maybe that function should take a impl IntoIterator<&'a str>?

Upvotes: 1

Related Questions