Reputation: 799
I'm scraping prices and I want to ensure the price string doesn't contain anything such as:
Right now I'm starting to do something like:
def parse_price(price):
price = price.replace(' ', '')
price = price.replace(',', '')
return price
Which I don't like the look of.
Allowed:
1.00
432.32
32324.03
Not allowed:
$1.00
3.43
C$32.55
£16.43
324,4343.20
Upvotes: 2
Views: 687
Reputation: 329
This can be pretty quickly solved with a regex expression. The easiest way would be to do:
import re
txt = "$23A. 234."
r = re.compile("[^\d\$\.]")
x = r.sub('', txt)
x = re.findall("\$[\d]*\.[\d]*", x)
print(x)
This will remove any characters that are NOT a digit, not a period, and not a dollar sign. Then use a pattern to match dollar sign, numbers, period, dollar sign. Note: if there are any more periods after the first, it won't grab anything after. I may update to fix this, but this should be good for now.
To fit OP's criteria, here's a revised version for no dollar sign and two decimal places:
import re
txt = "$23A. 234."
r = re.compile("[^\d\.]")
x = r.sub('', txt)
x = re.findall("[\d]*\.\d\d", x)
print(x)
Upvotes: 2