Reputation: 1648
So I'm processing a report that (brilliantly, really) spits out number values with commas in them, in a .csv output. Super useful.
So, I'm using (C#)regex lookahead positive and lookbehind positive expressions to remove commas that have digits on both sides.
If I use only the lookahead, it seems to work. However when I add the lookbehind as well, the expression breaks down and removes nothing. Both ends of the comma can have arbitrary numbers of digits around them, so I just want to remove the comma if the pattern has one or more digits around it.
Here's the expression that works with the lookahead only:
str = Regex.Replace(str, @"[,](?=(\d+)),"");
Here's the expression that doesn't work as I intend it:
str = Regex.Replace(str, @"[,](?=(\d+)?<=(\d+))", "");
What's wrong with my regex! If I had to guess, there's something I'm misunderstanding about how lookbehind works. Any ideas?
Upvotes: 1
Views: 4103
Reputation: 627607
You may use any of the solutions below:
var s = "abc,def,2,100,xyz!,:))))";
Console.WriteLine(Regex.Replace(s, @"(\d),(\d)", "$1$2")); // Does not handle 1,2,3,4 cases
Console.WriteLine(Regex.Replace(s, @"(\d),(?=\d)", "$1")); // Handles consecutive matches with capturing group+backreference/lookahead
Console.WriteLine(Regex.Replace(s, @"(?<=\d),(?=\d)", "")); // Handles consecutive matches with lookbehind/lookahead, the most efficient way
Console.WriteLine(Regex.Replace(s, @",(?<=\d,)(?=\d)", "")); // Also handles all cases
See the C# demo.
Explanations:
(\d),(\d)
- matches and captures single digits on both sides of ,
and $1$2
are replacement backreferences that insert captured texts back into the result(\d),(?=\d)
- matches and captures a digit before ,
, then a comma is matched and then a positive lookahead (?=\d)
requires a digit after ,
, but since it is not consumed, onyl $1
is required in the replacement pattern(?<=\d),(?=\d)
- only such a comma is matched that is enclosed with digits without consuming the digits ((?<=\d)
is a positive lookbehind that requires its pattern match immediately to the left of the current location),(?<=\d,)(?=\d)
- matches a comma and only after matching it, the regex engine checks if there is a digit and a comma immediately before the location (that is after the comma), and if the check if true, the next char is checked for a digit. If it is a digit, a match is returned.RegexHero.net test:
Bonus:
You may just match a pattern like yours with \d,\d
and pass the match to the MatchEvaluator
method where you may manipulate the match further:
Console.WriteLine(Regex.Replace(s, @"\d,\d", m => m.Value.Replace(",",string.Empty))); // Callback method
Here, m
is the match object and m.Value
holds the whole match value. With .Replace(",",string.Empty)
, you remove all commas from the match value.
Upvotes: 3
Reputation: 1800
You can always check a website that evaluates regex expressions. I think this code might be able to help you:
str = Regex.Replace(str, @"[,](?=(\d+))(?<=(\d))", "");
Upvotes: 0