Extracting Prices with Regex

Question

I'm look to extract prices from a string of scraped data.

I'm using this at the moment:

re.findall(r'£(?:\d+\.)?\d+.\d+', '£1.01')
['1.01']

Which works fine 99% of the time. However, I occasionally see this:

re.findall(r'£(?:\d+\.)?\d+.\d+', '£1,444.01')
['1,444']

I'd like to see ['1444.01'] ideally.

This is an example of the string I'm extracting the prices from.

'
                £1,000.73                


                + £1.26
UK delivery


'

I'm after some help putting together the regex to get ['1000.73', '1.26'] from that above string

Wiktor Stribiżew · Accepted Answer

You may grab all the values with '£(\d[\d.,]*)\b' and then remove all the commas with

import re
s = '
                £1,000.73                


                + £1.26
UK delivery


'
r = re.compile(r'£(\d[\d.,]*)\b')
print([x.replace(',', '') for x in re.findall(r, s)])
# => ['1000.73', '1.26']

See the Python demo

The £(\d[\d.,]*)\b pattern finds £ and then captures a digit and then any 0+ digits/,/., as many as possible, but will backtrack to a position where a word boundary is.

Extracting Prices with Regex

Answers (1)

Related Questions