Reputation: 959
I'm trying to find ranges of properly formatted currency or numbers in a string with regular expressions. I happen to be using C#, so the regex is formatted that way.
For example, I want to be able to find:
$10,000,000 to $20M
$10k-$20k
100.23k - 200.34k
$20000 and $500600
3456646 to 4230405
It should not match on:
$10,0000,000 to $20,000,000 //extra zero in first number
20k xyz 40k //middle string does not match a range word
Here is my regular expression so far:
(^|\s|\$)([1-9](?:\d*|(?:\d{0,2})(?:,\d{3})*)(?:\.\d*[1-9])?|0?\.\d*[1-9]|0)(|m|k)(?:|\s)(?:|to|and|-|,)(?:|\s)(|\$)([1-9](?:\d*|(?:\d{0,2})(?:,\d{3})*)(?:\.\d*[1-9])?|0?\.\d*[1-9]|0)(\s|m|k)
It seems to be working fairly well, but sometimes matches items I don't expect it to. Examples:
1985 xyz 1999 //2 matches, both numbers without xyz
$10,000,000 xyz $20000000 //1 match on the $2000000
$10,000,0000 to $20,000,000 //1 match on the $10,000,0000 (extra zero on end)
What am I missing? Is it foolish to do this with regex?
Upvotes: 4
Views: 1047
Reputation: 19987
Here you go buddy
(?<=^|\s)\$?\d+((\.\d{2})?(k|M)|(,\d{3})*)\b\s*(to|-|and )\s*\$?\d*((\.\d{2})?(k|M)|(,\d{3})*)(\s|$)
see it in action.
This part
\d+((\.\d{2})?(k|M)|(,\d{3})*)
is repeating itself. So better save that in a constant and concat this regex together.
String moneyPattern = @"\d+((\.\d{2})?(k|M)|(,\d{3})*)";
String rangeConnectorPattern = @"\b\s*(to|-|and\b)\s*";
String moneyRangePattern = @"(?<=^|\s)"+
moneyPattern + rangeConnectorPattern + moneyPattern +
"(\s|$)";
No need to write a parser.
Upvotes: 2