Reputation: 44
For example I would like my regex expression to capture both "1 dollar" if there are no cents, or "2 dollars and 71 cents" in my text. I currently have
'\d[\d]*( dollar(s)?)(?:\s*(and )\d[\d]( cent(s)?)\b)?')
and I have tested it out here regexr.com/67etd It seems to work there, but when I run it in python. What regex captures is
(' dollars', 's', '', '', '')
I apologize I am very new to regex, does anyone have any suggestions?
here is my python code:
import re
train = open(r"C:\Users\inigo\PycharmProjects\pythonProject\all-OANC.txt", encoding='utf8')
# didn't have encoding lol
# opens the files
strain = train.read()
# converts the files into a string
train.close()
#pattern = re.compile(r'\$\d[\d,.]*\b(?:\s*million\b)?(?:\s*billion\b)?')
pattern2 = re.compile('\d[\d]*( dollar(s)?)(?:\s*(and )\d[\d]*( cent(s)?)\b)?')
# Finds all numbers which can include commas and decimals that start with $ and if it has a million or a billion at the end
#We need to find patterns so if it contains a dollar keyword afterward it will count the number
matches = pattern2.findall(strain)
for match in matches:
print(match)
Upvotes: 2
Views: 240
Reputation: 11611
Try this regex:
\b\d+\s+dollars?(?:\s+and\s+\d+\s+cents?)?\b
Upvotes: 1
Reputation: 12668
in your regexp:
\d[\d]*( dollar(s)?)(?:\s*(and )\d[\d]( cent(s)?)\b)?
^ ^ ^ ^^ ^ ^ ^ ^ ^ ^ ^
| (2) || +(4)-+ | (6) | |
+----(1)----+| +-----(5)-+ |
+--------------(3)-------------+
these are the numbers of the different groups you can submatch. You have six groups, numbered after the position in the regexp of the left parenthesis, so this explains that, under the input string you matched, you get only the thing you describe. If you want the numbers, you need to add parenthesis to the subexpressions of interest, so you get them in some of the groups, this way:
(\d[\d]*)( dollar(s)?)(?:\s*(and )(\d[\d])( cent(s)?)\b)?
^ ^^ ^ ^ ^^ ^ ^^ ^^ ^ ^ ^ ^
+--(1)--+| (3) || +(5)-++--(6)-+| (8) | |
+----(2)----+| +----(7)--+ |
+--------------(4)---------------+
(now you got eigth groups) and you have to search for the dollars amount in group 1, and the cents amount in the 6th group.
Upvotes: 0