Reputation: 43
if I use the money_conversion func on $17 million, it returns 17000000 etc, only when its a single digit does it return an incorrect match i.e. $7 million converts to 7 instead of 7000000
import re
number = r'\d+(,\d{3})*\.*\d*' #$790,000
amount = r'thousand|million|billion' #$12.2 million example
word_re = rf'\${number}(-|\sto\s|–)?(\$*{number})\s?({amount})'
value_re = rf'\${number}'
def parse_word_syntax(string):
value_string = re.search(number,string).group()
value = float(value_string.replace(',',''))
word = re.search(amount,string,flags=re.I).group().lower()
word_value = word_to_value(word)
return value * word_value
def word_to_value(word):
value_dict ={'thousand':1000,'million':1000000,'billion':1000000000}
return value_dict[word]
def parse_value_syntax(string):
value_string = re.search(number,string).group()
value = float(value_string.replace(',',''))
return value
def money_conversion(money):
if money == 'N/A':
return None
if isinstance(money,list):
money = money[0]
word_syntax = re.search(word_re,money,flags=re.I)
value_syntax = re.search(value_re,money)
if word_syntax:
print('converting word object to numerics')
return parse_word_syntax(word_syntax.group())
elif value_syntax:
print('converting float objects to numerics')
return parse_value_syntax(value_syntax.group())
else:
return None
'''
Upvotes: 1
Views: 54
Reputation: 626927
The reason is quite simple: your regex does not match the word_re
regex that looks like \$\d+(,\d{3})*\.*\d*(-|\sto\s|–)?(\$*\d+(,\d{3})*\.*\d*)\s?(thousand|million|billion)
, see its demo. You tried to make each subsequent pattern part optional, and you forgot the \d+
from the number
variable block requires matching at least one digit, and since word_re
contains two occurrences of number
, the whole resulting regex requires at least two digits.
You need use
number = r'\d+(?:,\d{3})*(?:\.\d+)?'
word_re = rf'\${number}(?:(?:-|\sto\s|–)\${number})?\s*({amount})'
See the Python demo.
\$\d+(?:,\d{3})*(?:\.\d+)?
- matches $
, one or more digits, then zero or more repetitions of a comma and three digit chunk, and then an optional .
and one or more digits(?:(?:-|\sto\s|–)\$\d+(?:,\d{3})*(?:\.\d+)?)?
- an optional sequence of:
(?:-|\sto\s|–)
- -
, whitespace+to
+whitespace, or –
\$
- a $
char\d+(?:,\d{3})*(?:\.\d+)?
- see above\s*
- zero or more whitespaces(thousand|million|billion)
- one of the three strings.Upvotes: 2