C.B
C.B

Reputation: 17

Python Regular Expression to match specific currency format

I'm trying to write a regular expression in python 3.4 that will take the input from a text file of potential prices and match for valid formatting.

The requirements are that the price be in $X.YY or $X format where X must be greater than 0.

Invalid formats include $0.YY, $.YY, $X.Y, $X.YYY

So far this is what I have:

import re
from sys import argv

FILE = 1

file = open(argv[FILE], 'r')
string = file.read()
file.close()

price = re.compile(r"""         # beginning of string
                       (\$      # dollar sign
                       [1-9]    # first digit must be non-zero
                       \d * )   # followed by 0 or more digits
                       (\.       # optional cent portion
                       \d {2}  # only 2 digits allowed for cents
                         )?     # end of string""", re.X)

valid_prices = price.findall(string)
print(valid_prices)

This is the file I am using to test right now:

test.txt

 $34.23 $23 $23.23 $2 $2313443.23 $3422342 $02394 $230.232 $232.2 $05.03

Current output:

$[('$34', '.23'), ('$23', ''), ('$23', '.23'), ('$2', ''), ('$2313443', '.23'), ('$3422342', ''), ('$230', '.23'), ('$232', '')]

It is currently matching $230.232 and $232.2 when these should be rejected.

I am separating the dollar portion and the cent portion into different groups to do further processing later on. That is why my output is a list of tuples.

One catch here is that I do not know what deliminator, if any, will be used in the input file.

I am new to regular expressions and would really appreciate some help. Thank you!

Upvotes: 1

Views: 3584

Answers (3)

Tim007
Tim007

Reputation: 2557

Try this

\$(?!0\d)\d+(?:\.\d{2})?(?=\s|$)

Regex demo

Matches:

$34.23 $23 $23.23 $2 $2313443.23 $3422342 $0.99 $3.00

Upvotes: 0

Anton Harald
Anton Harald

Reputation: 5944

If it's really not clear, which delimeter will be used, to me it would only make sense to check for "not a digit and not a dot" as delimeter:

\$[1-9]\d*(\.\d\d)?(?![\d.])

https://regex101.com/r/jH2dN5/1

Upvotes: 1

heemayl
heemayl

Reputation: 42037

Add a zero width positive lookahead (?=\s|$) to ensure that the match will be followed by whitespace or end of the line only:

>>> s = '$34.23 $23 $23.23 $2 $2313443.23 $3422342 $02394 $230.232 $232.2 $05.03'

>>> re.findall(r'\$[1-9]\d*(?:\.\d{2})?(?=\s|$)', s)
['$34.23', '$23', '$23.23', '$2', '$2313443.23', '$3422342']

Upvotes: 1

Related Questions