cruelflames
cruelflames

Reputation: 33

regex to match all whitespace except those between words and surrounding hyphens?

I'd like to sanitize a string so all whitespace is removed, except those between words, and surrounding hyphens

1234 - Text | OneWord , Multiple Words | Another Text , 456 -> 1234 - Text|OneWord,Multiple Words|Another Text,456

std::regex regex(R"(\B\s+|\s+\B)"); //get rid of whitespaces except between words

auto newStr = std::regex_replace(str, regex, "*");
newStr = std::regex_replace(newStr, std::regex("*-*"), " - ");
newStr = std::regex_replace(newStr, std::regex("*"), "");

this is what I currently use, but it is rather ugly and I'm wondering if there is a regex I can use to do this in one go.

Upvotes: 3

Views: 179

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626709

You can use

(\s+-\s+|\b\s+\b)|\s+

Replace with $1, backreference to the captured substrings in Group 1. See the regex demo. Details:

  • (\s+-\s+|\b\s+\b) - Group 1: a - with one or more whitespaces on both sides, or one or more whitespaces in between word boundaries
  • | - or
  • \s+ - one or more whitespaces.

See the C++ demo:

std::string s("1234 - Text | OneWord , Multiple Words | Another Text , 456");
std::regex reg(R"((\s+-\s+|\b\s+\b)|\s+)");
std::cout << std::regex_replace(s, reg, "$1") << std::endl;
// => 1234 - Text|OneWord,Multiple Words|Another Text,456

Upvotes: 2

Related Questions