Scuba Steve
Scuba Steve

Reputation: 1648

Removing commas from numbers with .NET regex

So I'm processing a report that (brilliantly, really) spits out number values with commas in them, in a .csv output. Super useful.

So, I'm using (C#)regex lookahead positive and lookbehind positive expressions to remove commas that have digits on both sides.

If I use only the lookahead, it seems to work. However when I add the lookbehind as well, the expression breaks down and removes nothing. Both ends of the comma can have arbitrary numbers of digits around them, so I just want to remove the comma if the pattern has one or more digits around it.

Here's the expression that works with the lookahead only:

str = Regex.Replace(str, @"[,](?=(\d+)),"");

Here's the expression that doesn't work as I intend it:

str = Regex.Replace(str, @"[,](?=(\d+)?<=(\d+))", "");

What's wrong with my regex! If I had to guess, there's something I'm misunderstanding about how lookbehind works. Any ideas?

Upvotes: 1

Views: 4103

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627607

You may use any of the solutions below:

var s = "abc,def,2,100,xyz!,:))))";
Console.WriteLine(Regex.Replace(s, @"(\d),(\d)", "$1$2"));   // Does not handle 1,2,3,4 cases
Console.WriteLine(Regex.Replace(s, @"(\d),(?=\d)", "$1"));   // Handles consecutive matches with capturing group+backreference/lookahead
Console.WriteLine(Regex.Replace(s, @"(?<=\d),(?=\d)", ""));  // Handles consecutive matches with lookbehind/lookahead, the most efficient way
Console.WriteLine(Regex.Replace(s, @",(?<=\d,)(?=\d)", "")); // Also handles all cases

See the C# demo.

Explanations:

  • (\d),(\d) - matches and captures single digits on both sides of , and $1$2 are replacement backreferences that insert captured texts back into the result
  • (\d),(?=\d) - matches and captures a digit before ,, then a comma is matched and then a positive lookahead (?=\d) requires a digit after ,, but since it is not consumed, onyl $1 is required in the replacement pattern
  • (?<=\d),(?=\d) - only such a comma is matched that is enclosed with digits without consuming the digits ((?<=\d) is a positive lookbehind that requires its pattern match immediately to the left of the current location)
  • ,(?<=\d,)(?=\d) - matches a comma and only after matching it, the regex engine checks if there is a digit and a comma immediately before the location (that is after the comma), and if the check if true, the next char is checked for a digit. If it is a digit, a match is returned.

RegexHero.net test:

enter image description here

Bonus:

You may just match a pattern like yours with \d,\d and pass the match to the MatchEvaluator method where you may manipulate the match further:

Console.WriteLine(Regex.Replace(s, @"\d,\d", m => m.Value.Replace(",",string.Empty))); // Callback method

Here, m is the match object and m.Value holds the whole match value. With .Replace(",",string.Empty), you remove all commas from the match value.

Upvotes: 3

Pietro Nadalini
Pietro Nadalini

Reputation: 1800

You can always check a website that evaluates regex expressions. I think this code might be able to help you:

str = Regex.Replace(str, @"[,](?=(\d+))(?<=(\d))", "");

Upvotes: 0

Related Questions