nandesuka
nandesuka

Reputation: 799

Remove everything but numbers and decimals from string

I'm scraping prices and I want to ensure the price string doesn't contain anything such as:

Right now I'm starting to do something like:

def parse_price(price):
    price = price.replace(' ', '')
    price = price.replace(',', '')
    return price

Which I don't like the look of.

Allowed:

1.00
432.32
32324.03

Not allowed:

$1.00
 3.43
C$32.55
£16.43
324,4343.20

Upvotes: 2

Views: 687

Answers (1)

Gabe Ron
Gabe Ron

Reputation: 329

This can be pretty quickly solved with a regex expression. The easiest way would be to do:

import re

txt = "$23A. 234."
r = re.compile("[^\d\$\.]")
x = r.sub('', txt)
x = re.findall("\$[\d]*\.[\d]*", x)
print(x)

This will remove any characters that are NOT a digit, not a period, and not a dollar sign. Then use a pattern to match dollar sign, numbers, period, dollar sign. Note: if there are any more periods after the first, it won't grab anything after. I may update to fix this, but this should be good for now.

Edit:

To fit OP's criteria, here's a revised version for no dollar sign and two decimal places:

import re

txt = "$23A. 234."
r = re.compile("[^\d\.]")
x = r.sub('', txt)
x = re.findall("[\d]*\.\d\d", x)
print(x)

Upvotes: 2

Related Questions