Reputation: 366
I have a string, from which I want to remove the whitespaces between the numbers:
string test = "Some Words 1 2 3 4";
string result = Regex.Replace(test, @"(\d)\s(\d)", @"$1$2");
the expected/desired result would be:
"Some Words 1234"
but I retrieve the following:
"Some Words 12 34"
What am I doing wrong here?
Further examples:
Input: "Some Words That Should not be replaced 12 9 123 4 12"
Output: "Some Words That Should not be replaced 129123412"
Input: "test 9 8"
Output: "test 98"
Input: "t e s t 9 8"
Output: "t e s t 98"
Input: "Another 12 000"
Output: "Another 12000"
Upvotes: 33
Views: 2238
Reputation: 627488
Your regex consumes the digit on the right. (\d)\s(\d)
matches and captures 1
in Some Words 1 2 3 4
into Group 1, then matches 1 whitespace, and then matches and consumes (i.e. adds to the match value and advances the regex index) 2
. Then, the regex engine tries to find another match from the current index, that is already after 1 2
. So, the regex does not match 2 3
, but finds 3 4
.
Here is your regex demo and a diagram showing that:
Also, see the process of matching here:
Use lookarounds instead that are non-consuming:
(?<=\d)\s+(?=\d)
See the regex demo
Details
(?<=\d)
- a positive lookbehind that matches a location in string immediately preceded with a digit\s+
- 1+ whitespaces(?=\d)
- a positive lookahead that matches a location in string immediately followed with a digit.C# demo:
string test = "Some Words 1 2 3 4";
string result = Regex.Replace(test, @"(?<=\d)\s+(?=\d)", "");
See the online demo:
var strs = new List<string> {"Some Words 1 2 3 4", "Some Words That Should not be replaced 12 9 123 4 12", "test 9 8", "t e s t 9 8", "Another 12 000" };
foreach (var test in strs)
{
Console.WriteLine(Regex.Replace(test, @"(?<=\d)\s+(?=\d)", ""));
}
Output:
Some Words 1234
Some Words That Should not be replaced 129123412
test 98
t e s t 98
Another 12000
Upvotes: 44
Reputation: 172468
Regex.Replace continues to search after the previous match:
Some Words 1 2 3 4
^^^
first match, replace by "12"
Some Words 12 3 4
^
+-- continue searching here
Some Words 12 3 4
^^^
next match, replace by "34"
You can use a zero-width positive lookahead assertion to avoid that:
string result = Regex.Replace(test, @"(\d)\s(?=\d)", @"$1");
Now the final digit is not part of the match:
Some Words 1 2 3 4
^^?
first match, replace by "1"
Some Words 12 3 4
^
+-- continue searching here
Some Words 12 3 4
^^?
next match, replace by "2"
...
Upvotes: 44