Sam Machin
Sam Machin

Reputation: 3223

How does one find the currency value in a string?

I'm writing a small tool to extract a bunch of values from a string (usually a tweet).

The string could consist of words and numbers along with an amount prefixed by a currency symbol (£,$,€ etc.) and a number of hashtags (#foo #bar). I'm running on appEngine and using tweepy to bring in the tweets.

The current code I have to find the values is below:

tagex = re.compile(r'#.*')
curex = re.compile(ur'[£].*')
for x in api.user_timeline(since_id = t.lastimport):
          tags = re.findall(tagex, x.text)
          amount = re.findall(curex, x.text)[0]
          logging.info("Text: " + x.text)
          logging.info("Tags: " + str(tags))
          logging.info("Amount: " + amount)

where x.text is for example "Taxi London £6.50 #projectfoo #clientmeeting"

The tagex finds the hashtags fine, but I can't get curex to extract the amount currently I get: Amount: £6.50 #projectfoo #clientmeeting.

I also need to separate off the currency symbol so as to get the amount as a float, but that should be pretty simple later.

Upvotes: 4

Views: 11250

Answers (3)

Jeril
Jeril

Reputation: 8521

If you are okay with installing an additional python package named price-parser, then you can try the following:

Install the package

python -m pip install price-parser

Code to get the currency and amount

from price_parser import Price
result = Price.fromstring("Taxi London £6.50 #projectfoo #clientmeeting")
print(result)

Output:

Price(amount=Decimal('6.50'), currency='£')

Upvotes: 0

iChux
iChux

Reputation: 2386

I've altered Marcog's regex altered a bit


    re.search(ur'([£\$€])(\d+(?:\.\d{2})?)', s).groups()

by escaping the dollar sign.

Upvotes: 2

moinudin
moinudin

Reputation: 138357

>>> re.search(ur'([£$€])(\d+(?:\.\d{2})?)', s).groups()
(u'\xa3', u'6.50')
  • [£$€] matches one currency symbol
  • \d+(?:\.\d{2}) matches one or more digits followed by an optional decimal point followed by exactly two digits
  • The ()'s capture the symbol and amount separately

The problem with your regex is that .* matches anything and is greedy, so at the end of a regex it matches everything that follows.

Upvotes: 17

Related Questions