Inigo Hohmeyer
Inigo Hohmeyer

Reputation: 44

How do I get a regex expression to capture "x dollar(s) and x cents"?

For example I would like my regex expression to capture both "1 dollar" if there are no cents, or "2 dollars and 71 cents" in my text. I currently have

'\d[\d]*( dollar(s)?)(?:\s*(and )\d[\d]( cent(s)?)\b)?')

and I have tested it out here regexr.com/67etd It seems to work there, but when I run it in python. What regex captures is

(' dollars', 's', '', '', '')

I apologize I am very new to regex, does anyone have any suggestions?

here is my python code:

import re

train = open(r"C:\Users\inigo\PycharmProjects\pythonProject\all-OANC.txt", encoding='utf8')
# didn't have encoding lol
# opens the files
strain = train.read()
# converts the files into a string
train.close()
#pattern = re.compile(r'\$\d[\d,.]*\b(?:\s*million\b)?(?:\s*billion\b)?')
pattern2 = re.compile('\d[\d]*( dollar(s)?)(?:\s*(and )\d[\d]*( cent(s)?)\b)?')

# Finds all numbers which can include commas and decimals that start with $ and if it has a million or a billion at the end
#We need to find patterns so if it contains a dollar keyword afterward it will count the number




matches = pattern2.findall(strain)

for match in matches:

    print(match)

Upvotes: 2

Views: 240

Answers (3)

Wizard.Ritvik
Wizard.Ritvik

Reputation: 11611

Try this regex:

\b\d+\s+dollars?(?:\s+and\s+\d+\s+cents?)?\b

Regex Demo

Upvotes: 1

Luis Colorado
Luis Colorado

Reputation: 12668

in your regexp:

\d[\d]*( dollar(s)?)(?:\s*(and )\d[\d]( cent(s)?)\b)?
       ^       ^ ^ ^^     ^    ^      ^     ^ ^ ^  ^
       |       (2) ||     +(4)-+      |     (6) |  |
       +----(1)----+|                 +-----(5)-+  |
                    +--------------(3)-------------+

these are the numbers of the different groups you can submatch. You have six groups, numbered after the position in the regexp of the left parenthesis, so this explains that, under the input string you matched, you get only the thing you describe. If you want the numbers, you need to add parenthesis to the subexpressions of interest, so you get them in some of the groups, this way:

(\d[\d]*)( dollar(s)?)(?:\s*(and )(\d[\d])( cent(s)?)\b)?
^       ^^       ^ ^ ^^     ^    ^^      ^^     ^ ^ ^  ^
+--(1)--+|       (3) ||     +(5)-++--(6)-+|     (8) |  |
         +----(2)----+|                   +----(7)--+  |
                      +--------------(4)---------------+

(now you got eigth groups) and you have to search for the dollars amount in group 1, and the cents amount in the 6th group.

Upvotes: 0

Jab
Jab

Reputation: 27495

You could use this regex:

'(\d+ dollars?)(\s+and\s+\d{1,2} cents?)?'

Upvotes: 1

Related Questions