Leon Kyriacou
Leon Kyriacou

Reputation: 402

Extracting Prices with Regex

I'm look to extract prices from a string of scraped data.

I'm using this at the moment:

re.findall(r'£(?:\d+\.)?\d+.\d+', '£1.01')
['1.01']

Which works fine 99% of the time. However, I occasionally see this:

re.findall(r'£(?:\d+\.)?\d+.\d+', '£1,444.01')
['1,444']

I'd like to see ['1444.01'] ideally.

This is an example of the string I'm extracting the prices from.

'\n                £1,000.73                \n\n\n                + £1.26\nUK delivery\n\n\n'

I'm after some help putting together the regex to get ['1000.73', '1.26'] from that above string

Upvotes: 7

Views: 2438

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626747

You may grab all the values with '£(\d[\d.,]*)\b' and then remove all the commas with

import re
s = '\n                £1,000.73                \n\n\n                + £1.26\nUK delivery\n\n\n'
r = re.compile(r'£(\d[\d.,]*)\b')
print([x.replace(',', '') for x in re.findall(r, s)])
# => ['1000.73', '1.26']

See the Python demo

The £(\d[\d.,]*)\b pattern finds £ and then captures a digit and then any 0+ digits/,/., as many as possible, but will backtrack to a position where a word boundary is.

Upvotes: 9

Related Questions