Reputation: 4136
I don't understand how to write a regex for these multiple patterns to be removed (with something like .replace(pattern, "")
):
two or more spaces not in a string removed
two or more spaces in a string reduced to one (ex: " text other " -> "text other")
one or more spaces removed after and before characters such as:
\n
\r\n
\t
replace \r\n
with \n
I tried with +|\\n +|\t +\\r\n .+
but obviously this doesn't work totally.
We can use the below patterns to check it's working:
assert_eq!(not_useful_space(" "), "");
assert_eq!(not_useful_space(" a l l lower "), "a l l lower");
assert_eq!(not_useful_space(" i need\n new lines\n\n many times "), "i need\nnew lines\n\nmany times");
assert_eq!(not_useful_space(" i need \n new lines \n\n many times "), "i need\nnew lines\n\nmany times");
assert_eq!(not_useful_space(" i need \r\n new lines\r\nmany times "), "i need\nnew lines\nmany times");
assert_eq!(not_useful_space(" i need \t new lines\t \t many times "), "i need new lines many times");
assert_eq!(not_useful_space(" à la "), "à la");
Upvotes: 1
Views: 184
Reputation: 361710
If you're interested, here's a non-regex version:
fn not_useful_space(text: &str) -> String {
text.lines()
.map(|line| {
line.trim()
.split_ascii_whitespace()
.collect::<Vec<_>>()
.join(" ")
})
.collect::<Vec<_>>()
.join("\n")
}
Upvotes: 1
Reputation: 785256
You can do this in a single regex with MULTILINE
flag enabled:
(?m)[ \t]*\r[ \t]*|^[ \t]+|[ \t]+$|\t]+$|([ \t]){2,}
Replace it with $1
string.
Rust Code:
use once_cell::sync::Lazy;
use regex::Regex;
pub fn magic(input: &str) -> String {
static REGEX: Lazy<Regex> = Lazy::new(|| {
Regex::new(r"(?m)[ \t]*\r[ \t]*|^[ \t]+|[ \t]+$|\t]+$|([ \t]){2,}").unwrap()
});
REGEX.replace_all(input, "$1").to_string()
}
#[cfg(test)]
fn magic_data() -> std::collections::HashMap<&'static str, &'static str> {
std::collections::HashMap::from([
(" ", ""),
(" a l l lower ", "a l l lower"),
(
" i need\n new lines\n\n many times ",
"i need\nnew lines\n\nmany times",
),
(
" i need \n new lines \n\n many times ",
"i need\nnew lines\n\nmany times",
),
(
" i need \r\n new lines\r\nmany times ",
"i need\nnew lines\nmany times",
),
(
" i need \t new lines\t \t many times ",
"i need new lines many times",
),
(" à la ", "à la"),
])
}
#[test]
fn test() {
for (k, v) in magic_data() {
assert_eq!(magic(k), v)
}
}
Javascript Demo:
function assert_eq(lhs, rhs) {
console.log(lhs == rhs);
}
function not_useful_space(str) {
return str.replace(/^[ \t]+|[ \t]+$|\r|([ \t]){2,}/mg, '$1');
}
assert_eq(not_useful_space(" "), "");
assert_eq(not_useful_space(" a l l lower "), "a l l lower");
assert_eq(not_useful_space(" i need\n new lines\n\n many times "), "i need\nnew lines\n\nmany times");
assert_eq(not_useful_space(" i need \n new lines \n\n many times "), "i need\nnew lines\n\nmany times");
assert_eq(not_useful_space(" i need \r\n new lines\r\nmany times "), "i need\nnew lines\nmany times");
assert_eq(not_useful_space(" i need \t new lines\t \t many times "), "i need new lines many times");
assert_eq(not_useful_space(" à la "), "à la");
RegEx Breakup:
^
: start[ \t]*\r[ \t]*
: Match \r
surrounded with optional spaces on both sides[ \t]+
: match 1+ of space or tab characters|
: OR[ \t]+
: match 1+ of space or tab characters$
: end|
: OR([ \t]){2,}
: match 2+ of space or tab characters$1
: Is replacement to get single space/tab character back in substitutionUpvotes: 2