How to write regex to capture specific number formats and exclude the rest?

Question

I am trying to capture limited true cases from a string with many other invalid number cases in Python Regex. The true cases are effectively valid number format with commas or number with commas and decimal. Everything else is invalid. Sample is below.

Sample input string:

input = "The net value is 1,000,000.00 however even 100,000 or 1,000,000 is acceptable. The amounts that are not acceptable are 1 000,000.00 or 1.000.000.00 or 1,000,000,00 or 1,000,000,0000"

Output is: 1,000,000.00 100,000 1,000,000

The current python regex I tried is as follows:

\d{1,3}(,{1}\d{3})*(\.{1}\d+){0,1}$

This only works when the input is just numbers. When I try to input numbers with words around them it fails.

Tim Biegeleisen · Accepted Answer

The following regex pattern gets closer to what you want here:

(?


This uses lookarounds to assert that boundaries for the numbers must be either whitespace or the start/end of the input.  Also note that we insist that each valid number not start with zero.
I would use re.findall as follows:
inp = "The net value is 1,000,000.00 however even 100,000 or 1,000,000 is acceptable. The amounts that are not acceptable are 1 000,000.00 or 1.000.000.00 or 1,000,000,00 or 1,000,000,0000"
matches = re.findall(r'(?

This prints:
['1,000,000.00', '100,000', '1,000,000', '1']

As a note on why 1 appears as a result above, there is no obvious way to know that the stanadalone number 1 is actually part of the broken one million number.

How to write regex to capture specific number formats and exclude the rest?

Answers (2)

Related Questions