Reputation: 17
I'm trying to write a regular expression in python 3.4 that will take the input from a text file of potential prices and match for valid formatting.
The requirements are that the price be in $X.YY or $X format where X must be greater than 0.
Invalid formats include $0.YY, $.YY, $X.Y, $X.YYY
So far this is what I have:
import re
from sys import argv
FILE = 1
file = open(argv[FILE], 'r')
string = file.read()
file.close()
price = re.compile(r""" # beginning of string
(\$ # dollar sign
[1-9] # first digit must be non-zero
\d * ) # followed by 0 or more digits
(\. # optional cent portion
\d {2} # only 2 digits allowed for cents
)? # end of string""", re.X)
valid_prices = price.findall(string)
print(valid_prices)
This is the file I am using to test right now:
test.txt
$34.23 $23 $23.23 $2 $2313443.23 $3422342 $02394 $230.232 $232.2 $05.03
Current output:
$[('$34', '.23'), ('$23', ''), ('$23', '.23'), ('$2', ''), ('$2313443', '.23'), ('$3422342', ''), ('$230', '.23'), ('$232', '')]
It is currently matching $230.232 and $232.2 when these should be rejected.
I am separating the dollar portion and the cent portion into different groups to do further processing later on. That is why my output is a list of tuples.
One catch here is that I do not know what deliminator, if any, will be used in the input file.
I am new to regular expressions and would really appreciate some help. Thank you!
Upvotes: 1
Views: 3584
Reputation: 2557
Try this
\$(?!0\d)\d+(?:\.\d{2})?(?=\s|$)
Matches:
$34.23 $23 $23.23 $2 $2313443.23 $3422342 $0.99 $3.00
Upvotes: 0
Reputation: 5944
If it's really not clear, which delimeter will be used, to me it would only make sense to check for "not a digit and not a dot" as delimeter:
\$[1-9]\d*(\.\d\d)?(?![\d.])
https://regex101.com/r/jH2dN5/1
Upvotes: 1
Reputation: 42037
Add a zero width positive lookahead (?=\s|$)
to ensure that the match will be followed by whitespace or end of the line only:
>>> s = '$34.23 $23 $23.23 $2 $2313443.23 $3422342 $02394 $230.232 $232.2 $05.03'
>>> re.findall(r'\$[1-9]\d*(?:\.\d{2})?(?=\s|$)', s)
['$34.23', '$23', '$23.23', '$2', '$2313443.23', '$3422342']
Upvotes: 1