Reputation: 1124
I wanted to match the numeric values of a string:
1,000 metric tonnes per contract month
Five cents ($0.05) per tonne
Five cents ($0.05) per tonne
1,000 metric tonnes per contract month
My current approach:
size = re.findall(r'(\d+(,?\d*).*?)', my_string)
What I get with my approach:
print size
[(u'1,000', u',000')]
As you can see, the number 1
was being cut out from the second element of the list, why is that? Also, could I get a hint as to how I can match the $0.05
terms?
Upvotes: 4
Views: 110
Reputation: 582
I would try this regex:
r'[0-9]+(?:,[0-9]+)(?:.[0-9])?'
Add \$? at the beginning to optionally catch the $
Upvotes: 0
Reputation: 250881
Something like this:
>>> import re
>>> strs = """1,000 metric tonnes per contract month
Five cents ($0.05) per tonne
Five cents ($0.05) per tonne
1,000 metric tonnes per contract month"""
>>> [m.group(0) for m in re.finditer(r'\$?\d+([,.]\d+)?', strs)]
['1,000', '$0.05', '$0.05', '1,000']
Demo : http://rubular.com/r/UomzIY3SD3
Upvotes: 3
Reputation: 11233
Try this regex:
(\$?\d+(?:[,.]?\d*(?:\.\d+)?)).*?
Upvotes: 0
Reputation: 336098
re,findall()
returns a tuple of all the capturing groups for each match, and each set of normal parentheses generates one such group. Write your regex like this:
size = re.findall(r'\d{1,3}(?:,\d{3})*(?:\.\d+)?', my_string)
Explanation:
\d{1,3} # One to three digits
(?:,\d{3})* # Optional thousands groups
(?:\.\d+)? # Optional decimal part
This assumes that all numbers have commas as thousands separators, i. e. no numbers like 1000000
. If you need to match those too, use
size = re.findall(r'\d+(?:,\d{3})*(?:\.\d+)?', my_string)
Upvotes: 3