Fred Hors
Fred Hors

Reputation: 4136

How to remove useless space in Rust string without using regex

Is there a way in Rust to in a string:

  1. reduce double spaces to one and
  2. remove spaces before and after \n, \r\n, tabs?

As you can imagine this is all text coming from form inputs like text and textarea.

All this:

  1. without using regex
  2. with unicode chars

Some tests to satisfy are:

#[test]
fn test() {
    assert_eq!(magic("  ".to_string()), " ");
    
    assert_eq!(
        magic("     a l    l   lower      ".to_string()),
        "a l l lower"
    );
    
    assert_eq!(
        magic("     i need\nnew  lines \n\nmany   times     ".to_string()),
        "i need\nnew lines\n\nmany times"
    );
    
    assert_eq!(magic("  à   la  ".to_string()), "à la");
}

In golang I'm using:

func Magic(s string) string {
    return strings.ReplaceAll(strings.Join(strings.FieldsFunc(s, func(r rune) bool {
        if r == '\n' {
            return false
        }

        return unicode.IsSpace(r)
    }), " "), " \n", "\n")
}

Upvotes: 0

Views: 1232

Answers (1)

PitaJ
PitaJ

Reputation: 15022

This is what I worked out:

fn magic(input: &str) -> String {
    input
        // trim leading and trailing space
        .trim()
        // split into lines
        .lines()
        .map(|part| {
            // for each line
            part
                // trim leading and trailing whitespace
                .trim()
                // split on whitespace,
                // including the space where the string was split
                // into the resulting parts
                .split_inclusive(char::is_whitespace)
                // filter out substrings containing only whitespace
                .filter(|part| !part.trim().is_empty())
                // collect into a String for this line
                .collect()
        })
        // collect into a Vec of Strings
        .collect::<Vec<String>>()
        // join those Strings with a newline
        // back into the final String
        .join("\n")
}

It doesn't handle a string containing only whitespace the same, instead it will return an empty string. Also, it will normalize all line breaks to \n.

playground

There is a slightly different version that's probably faster, with fewer allocations:

fn magic(input: &str) -> String {
    let mut output: String = input
        // trim leading and trailing space
        .trim()
        // split into lines
        .lines()
        .flat_map(|part| {
            // for each line
            part
                // trim leading and trailing space
                .trim()
                // split on whitespace
                // including the space where the string was split
                .split_inclusive(char::is_whitespace)
                // filter out substrings containing only whitespace
                .filter(|part| !part.trim().is_empty())
                // add a newline after each line
                .chain(["\n"])
        })
        // collect into a String
        .collect();
    
    // remove the last newline
    output.truncate(output.trim_end().len());
    
    output
}

playground

I'll let you decide which you prefer.

Upvotes: 2

Related Questions