aues
aues

Reputation: 366

Removing whitespace between consecutive numbers

I have a string, from which I want to remove the whitespaces between the numbers:

string test = "Some Words 1 2 3 4";
string result = Regex.Replace(test, @"(\d)\s(\d)", @"$1$2");

the expected/desired result would be:

"Some Words 1234"

but I retrieve the following:

"Some Words 12 34"

What am I doing wrong here?

Further examples:

Input:  "Some Words That Should not be replaced 12 9 123 4 12"
Output: "Some Words That Should not be replaced 129123412"

Input:  "test 9 8"
Output: "test 98"

Input:  "t e s t 9 8"
Output: "t e s t 98"

Input:  "Another 12 000"
Output: "Another 12000"

Upvotes: 33

Views: 2238

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627488

Your regex consumes the digit on the right. (\d)\s(\d) matches and captures 1 in Some Words 1 2 3 4 into Group 1, then matches 1 whitespace, and then matches and consumes (i.e. adds to the match value and advances the regex index) 2. Then, the regex engine tries to find another match from the current index, that is already after 1 2. So, the regex does not match 2 3, but finds 3 4.

Here is your regex demo and a diagram showing that:

enter image description here

Also, see the process of matching here:

enter image description here

Use lookarounds instead that are non-consuming:

(?<=\d)\s+(?=\d)

See the regex demo

enter image description here

Details

  • (?<=\d) - a positive lookbehind that matches a location in string immediately preceded with a digit
  • \s+ - 1+ whitespaces
  • (?=\d) - a positive lookahead that matches a location in string immediately followed with a digit.

C# demo:

string test = "Some Words 1 2 3 4";
string result = Regex.Replace(test, @"(?<=\d)\s+(?=\d)", "");

See the online demo:

var strs = new List<string> {"Some Words 1 2 3 4", "Some Words That Should not be replaced 12 9 123 4 12", "test 9 8", "t e s t 9 8", "Another 12 000" };
foreach (var test in strs) 
{
    Console.WriteLine(Regex.Replace(test, @"(?<=\d)\s+(?=\d)", ""));
}

Output:

Some Words 1234
Some Words That Should not be replaced 129123412
test 98
t e s t 98
Another 12000

Upvotes: 44

Heinzi
Heinzi

Reputation: 172468

Regex.Replace continues to search after the previous match:

Some Words 1 2 3 4
           ^^^
         first match, replace by "12"

Some Words 12 3 4
             ^
             +-- continue searching here

Some Words 12 3 4
              ^^^
            next match, replace by "34"

You can use a zero-width positive lookahead assertion to avoid that:

string result = Regex.Replace(test, @"(\d)\s(?=\d)", @"$1");

Now the final digit is not part of the match:

Some Words 1 2 3 4
           ^^?
         first match, replace by "1"

Some Words 12 3 4
            ^
            +-- continue searching here

Some Words 12 3 4
            ^^?
            next match, replace by "2"

...

Upvotes: 44

Related Questions