Reputation: 33
I'd like to sanitize a string so all whitespace is removed, except those between words, and surrounding hyphens
1234 - Text | OneWord , Multiple Words | Another Text , 456
-> 1234 - Text|OneWord,Multiple Words|Another Text,456
std::regex regex(R"(\B\s+|\s+\B)"); //get rid of whitespaces except between words
auto newStr = std::regex_replace(str, regex, "*");
newStr = std::regex_replace(newStr, std::regex("*-*"), " - ");
newStr = std::regex_replace(newStr, std::regex("*"), "");
this is what I currently use, but it is rather ugly and I'm wondering if there is a regex I can use to do this in one go.
Upvotes: 3
Views: 179
Reputation: 626709
You can use
(\s+-\s+|\b\s+\b)|\s+
Replace with $1
, backreference to the captured substrings in Group 1. See the regex demo. Details:
(\s+-\s+|\b\s+\b)
- Group 1: a -
with one or more whitespaces on both sides, or one or more whitespaces in between word boundaries|
- or\s+
- one or more whitespaces.See the C++ demo:
std::string s("1234 - Text | OneWord , Multiple Words | Another Text , 456");
std::regex reg(R"((\s+-\s+|\b\s+\b)|\s+)");
std::cout << std::regex_replace(s, reg, "$1") << std::endl;
// => 1234 - Text|OneWord,Multiple Words|Another Text,456
Upvotes: 2