Reputation: 402
I'm look to extract prices from a string of scraped data.
I'm using this at the moment:
re.findall(r'£(?:\d+\.)?\d+.\d+', '£1.01')
['1.01']
Which works fine 99% of the time. However, I occasionally see this:
re.findall(r'£(?:\d+\.)?\d+.\d+', '£1,444.01')
['1,444']
I'd like to see ['1444.01']
ideally.
This is an example of the string I'm extracting the prices from.
'\n £1,000.73 \n\n\n + £1.26\nUK delivery\n\n\n'
I'm after some help putting together the regex to get ['1000.73', '1.26']
from that above string
Upvotes: 7
Views: 2438
Reputation: 626747
You may grab all the values with '£(\d[\d.,]*)\b'
and then remove all the commas with
import re
s = '\n £1,000.73 \n\n\n + £1.26\nUK delivery\n\n\n'
r = re.compile(r'£(\d[\d.,]*)\b')
print([x.replace(',', '') for x in re.findall(r, s)])
# => ['1000.73', '1.26']
See the Python demo
The £(\d[\d.,]*)\b
pattern finds £
and then captures a digit and then any 0+ digits/,
/.
, as many as possible, but will backtrack to a position where a word boundary is.
Upvotes: 9