Reputation:
I need to convert one
into 1
, two
into 2
and so on.
Is there a way to do this with a library or a class or anything?
Upvotes: 108
Views: 175562
Reputation: 1
It's a cool solution, so I took @recursive's Python code from their answer and with help of ChatGPT I converted it to C# and also simplified it, formatted it, and made it a bit more compact.
Yes, I had to give a ton of instructions to ChatGPT. It took me a while, but here it is.
I believe it is clearer and easier to understand this code and how the algorithm works:
public class Parser
{
public static int ParseInt(string s)
{
Dictionary<string, (int scale, int increment)> numwords = new Dictionary<string, (int, int)>
{
{"and", (1, 0)}, {"zero", (1, 0)}, {"one", (1, 1)}, {"two", (1, 2)}, {"three", (1, 3)},
{"four", (1, 4)}, {"five", (1, 5)}, {"six", (1, 6)}, {"seven", (1, 7)}, {"eight", (1, 8)},
{"nine", (1, 9)}, {"ten", (1, 10)}, {"eleven", (1, 11)}, {"twelve", (1, 12)}, {"thirteen", (1, 13)},
{"fourteen", (1, 14)}, {"fifteen", (1, 15)}, {"sixteen", (1, 16)}, {"seventeen", (1, 17)}, {"eighteen", (1, 18)},
{"nineteen", (1, 19)}, {"twenty", (1, 20)}, {"thirty", (1, 30)}, {"forty", (1, 40)}, {"fifty", (1, 50)},
{"sixty", (1, 60)}, {"seventy", (1, 70)}, {"eighty", (1, 80)}, {"ninety", (1, 90)}, {"hundred", (100, 0)},
{"thousand", (1000, 0)}, {"million", (1000000, 0)}, {"billion", (1000000000, 0)}
};
int current = 0;
int result = 0;
foreach (string word in s.Replace("-", " ").Split())
{
var (scale, increment) = numwords[word];
current = current * scale + increment;
if (scale > 100)
{
result += current;
current = 0;
}
}
return result + current;
}
}
Upvotes: -1
Reputation: 56
I was looking for a library that will help me support all above and more edge case scenarios like ordinal numbers(first, second), bigger numbers , operators, etc and I found this numwords-to-nums
You can install via
pip install numwords_to_nums
Here's a basic example
from numwords_to_nums.numwords_to_nums import NumWordsToNum
num = NumWordsToNum()
result = num.numerical_words_to_numbers("twenty ten and twenty one")
print(result) # Output: 2010 and 21
eval_result = num.evaluate('Hey calculate 2+5')
print(eval_result) # Output: 7
result = num.numerical_words_to_numbers('first')
print(result) # Output: 1st
Upvotes: 3
Reputation: 1
I find I faster way:
Da_Unità_a_Cifre = {'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5, 'six': 6, 'seven': 7, 'eight': 8, 'nine': 9, 'ten': 10, 'eleven': 11,
'twelve': 12, 'thirteen': 13, 'fourteen': 14, 'fifteen': 15, 'sixteen': 16, 'seventeen': 17, 'eighteen': 18, 'nineteen': 19}
Da_Lettere_a_Decine = {"tw": 20, "th": 30, "fo": 40, "fi": 50, "si": 60, "se": 70, "ei": 80, "ni": 90, }
elemento = input("insert the word:")
Val_Num = 0
try:
elemento.lower()
elemento.strip()
Unità = elemento[elemento.find("ty")+2:] # è uguale alla str: five
if elemento[-1] == "y":
Val_Num = int(Da_Lettere_a_Decine[elemento[0] + elemento[1]])
print(Val_Num)
elif elemento == "onehundred":
Val_Num = 100
print(Val_Num)
else:
Cifre_Unità = int(Da_Unità_a_Cifre[Unità])
Cifre_Decine = int(Da_Lettere_a_Decine[elemento[0] + elemento[1]])
Val_Num = int(Cifre_Decine + Cifre_Unità)
print(Val_Num)
except:
print("invalid input")
Upvotes: -1
Reputation: 1
This code works only for numbers below 99. Both word to int and int to word (for rest need to implement 10-20 lines of code and simple logic. This is just simple code for beginners):
num = input("Enter the number you want to convert : ")
mydict = {'1': 'One', '2': 'Two', '3': 'Three', '4': 'Four', '5': 'Five','6': 'Six', '7': 'Seven', '8': 'Eight', '9': 'Nine', '10': 'Ten','11': 'Eleven', '12': 'Twelve', '13': 'Thirteen', '14': 'Fourteen', '15': 'Fifteen', '16': 'Sixteen', '17': 'Seventeen', '18': 'Eighteen', '19': 'Nineteen'}
mydict2 = ['', '', 'Twenty', 'Thirty', 'Fourty', 'fifty', 'sixty', 'Seventy', 'Eighty', 'Ninty']
if num.isdigit():
if(int(num) < 20):
print(" :---> " + mydict[num])
else:
var1 = int(num) % 10
var2 = int(num) / 10
print(" :---> " + mydict2[int(var2)] + mydict[str(var1)])
else:
num = num.lower()
dict_w = {'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5, 'six': 6, 'seven': 7, 'eight': 8, 'nine': 9, 'ten': 10, 'eleven': 11, 'twelve': 12, 'thirteen': 13, 'fourteen': 14, 'fifteen': 15, 'sixteen': 16, 'seventeen': '17', 'eighteen': '18', 'nineteen': '19'}
mydict2 = ['', '', 'twenty', 'thirty', 'fourty', 'fifty', 'sixty', 'seventy', 'eighty', 'ninty']
divide = num[num.find("ty")+2:]
if num:
if(num in dict_w.keys()):
print(" :---> " + str(dict_w[num]))
elif divide == '' :
for i in range(0, len(mydict2)-1):
if mydict2[i] == num:
print(" :---> " + str(i * 10))
else :
str3 = 0
str1 = num[num.find("ty")+2:]
str2 = num[:-len(str1)]
for i in range(0, len(mydict2)):
if mydict2[i] == str2:
str3 = i
if str2 not in mydict2:
print("----->Invalid Input<-----")
else:
try:
print(" :---> " + str((str3*10) + dict_w[str1]))
except:
print("----->Invalid Input<-----")
else:
print("----->Please Enter Input<-----")
Upvotes: -3
Reputation: 61
def parse_int(string):
ONES = {'zero': 0,
'one': 1,
'two': 2,
'three': 3,
'four': 4,
'five': 5,
'six': 6,
'seven': 7,
'eight': 8,
'nine': 9,
'ten': 10,
'eleven': 11,
'twelve': 12,
'thirteen': 13,
'fourteen': 14,
'fifteen': 15,
'sixteen': 16,
'seventeen': 17,
'eighteen': 18,
'nineteen': 19,
'twenty': 20,
'thirty': 30,
'forty': 40,
'fifty': 50,
'sixty': 60,
'seventy': 70,
'eighty': 80,
'ninety': 90,
}
numbers = []
for token in string.replace('-', ' ').split(' '):
if token in ONES:
numbers.append(ONES[token])
elif token == 'hundred':
numbers[-1] *= 100
elif token == 'thousand':
numbers = [x * 1000 for x in numbers]
elif token == 'million':
numbers = [x * 1000000 for x in numbers]
return sum(numbers)
Tested with 700 random numbers in range 1 to million works well.
Upvotes: 6
Reputation: 91
Make use of the Python package: WordToDigits
pip install wordtodigits
It can find numbers present in word form in a sentence and then convert them to the proper numeric format. Also takes care of the decimal part, if present. The word representation of numbers could be anywhere in the passage.
Upvotes: 4
Reputation: 98002
I needed to handle a couple extra parsing cases, such as ordinal words ("first", "second"), hyphenated words ("one-hundred"), and hyphenated ordinal words like ("fifty-seventh"), so I added a couple lines:
def text2int(textnum, numwords={}):
if not numwords:
units = [
"zero", "one", "two", "three", "four", "five", "six", "seven", "eight",
"nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen",
"sixteen", "seventeen", "eighteen", "nineteen",
]
tens = ["", "", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"]
scales = ["hundred", "thousand", "million", "billion", "trillion"]
numwords["and"] = (1, 0)
for idx, word in enumerate(units): numwords[word] = (1, idx)
for idx, word in enumerate(tens): numwords[word] = (1, idx * 10)
for idx, word in enumerate(scales): numwords[word] = (10 ** (idx * 3 or 2), 0)
ordinal_words = {'first':1, 'second':2, 'third':3, 'fifth':5, 'eighth':8, 'ninth':9, 'twelfth':12}
ordinal_endings = [('ieth', 'y'), ('th', '')]
textnum = textnum.replace('-', ' ')
current = result = 0
for word in textnum.split():
if word in ordinal_words:
scale, increment = (1, ordinal_words[word])
else:
for ending, replacement in ordinal_endings:
if word.endswith(ending):
word = "%s%s" % (word[:-len(ending)], replacement)
if word not in numwords:
raise Exception("Illegal word: " + word)
scale, increment = numwords[word]
current = current * scale + increment
if scale > 100:
result += current
current = 0
return result + current`
Upvotes: 12
Reputation: 1
This handles number in words of Indian style, some fractions, combination of numbers and words and also addition.
def words_to_number(words):
numbers = {"zero":0, "a":1, "half":0.5, "quarter":0.25, "one":1,"two":2,
"three":3, "four":4,"five":5,"six":6,"seven":7,"eight":8,
"nine":9, "ten":10,"eleven":11,"twelve":12, "thirteen":13,
"fourteen":14, "fifteen":15,"sixteen":16,"seventeen":17,
"eighteen":18,"nineteen":19, "twenty":20,"thirty":30, "forty":40,
"fifty":50,"sixty":60,"seventy":70, "eighty":80,"ninety":90}
groups = {"hundred":100, "thousand":1_000,
"lac":1_00_000, "lakh":1_00_000,
"million":1_000_000, "crore":10**7,
"billion":10**9, "trillion":10**12}
split_at = ["and", "plus"]
n = 0
skip = False
words_array = words.split(" ")
for i, word in enumerate(words_array):
if not skip:
if word in groups:
n*= groups[word]
elif word in numbers:
n += numbers[word]
elif word in split_at:
skip = True
remaining = ' '.join(words_array[i+1:])
n+=words_to_number(remaining)
else:
try:
n += float(word)
except ValueError as e:
raise ValueError(f"Invalid word {word}") from e
return n
TEST:
print(words_to_number("a million and one"))
>> 1000001
print(words_to_number("one crore and one"))
>> 1000,0001
print(words_to_number("0.5 million one"))
>> 500001.0
print(words_to_number("half million and one hundred"))
>> 500100.0
print(words_to_number("quarter"))
>> 0.25
print(words_to_number("one hundred plus one"))
>> 101
Upvotes: 0
Reputation: 1
This code works for a series data:
import pandas as pd
mylist = pd.Series(['one','two','three'])
mylist1 = []
for x in range(len(mylist)):
mylist1.append(w2n.word_to_num(mylist[x]))
print(mylist1)
Upvotes: -2
Reputation: 915
I took @recursive's logic and converted to Ruby. I've also hardcoded the lookup table so its not as cool but might help a newbie understand what is going on.
WORDNUMS = {"zero"=> [1,0], "one"=> [1,1], "two"=> [1,2], "three"=> [1,3],
"four"=> [1,4], "five"=> [1,5], "six"=> [1,6], "seven"=> [1,7],
"eight"=> [1,8], "nine"=> [1,9], "ten"=> [1,10],
"eleven"=> [1,11], "twelve"=> [1,12], "thirteen"=> [1,13],
"fourteen"=> [1,14], "fifteen"=> [1,15], "sixteen"=> [1,16],
"seventeen"=> [1,17], "eighteen"=> [1,18], "nineteen"=> [1,19],
"twenty"=> [1,20], "thirty" => [1,30], "forty" => [1,40],
"fifty" => [1,50], "sixty" => [1,60], "seventy" => [1,70],
"eighty" => [1,80], "ninety" => [1,90],
"hundred" => [100,0], "thousand" => [1000,0],
"million" => [1000000, 0]}
def text_2_int(string)
numberWords = string.gsub('-', ' ').split(/ /) - %w{and}
current = result = 0
numberWords.each do |word|
scale, increment = WORDNUMS[word]
current = current * scale + increment
if scale > 100
result += current
current = 0
end
end
return result + current
end
I was looking to handle strings like two thousand one hundred and forty-six
Upvotes: 0
Reputation: 2608
I needed something a bit different since my input is from a speech-to-text conversion and the solution is not always to sum the numbers. For example, "my zipcode is one two three four five" should not convert to "my zipcode is 15".
I took Andrew's answer and tweaked it to handle a few other cases people highlighted as errors, and also added support for examples like the zipcode one I mentioned above. Some basic test cases are shown below, but I'm sure there is still room for improvement.
def is_number(x):
if type(x) == str:
x = x.replace(',', '')
try:
float(x)
except:
return False
return True
def text2int (textnum, numwords={}):
units = [
'zero', 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight',
'nine', 'ten', 'eleven', 'twelve', 'thirteen', 'fourteen', 'fifteen',
'sixteen', 'seventeen', 'eighteen', 'nineteen',
]
tens = ['', '', 'twenty', 'thirty', 'forty', 'fifty', 'sixty', 'seventy', 'eighty', 'ninety']
scales = ['hundred', 'thousand', 'million', 'billion', 'trillion']
ordinal_words = {'first':1, 'second':2, 'third':3, 'fifth':5, 'eighth':8, 'ninth':9, 'twelfth':12}
ordinal_endings = [('ieth', 'y'), ('th', '')]
if not numwords:
numwords['and'] = (1, 0)
for idx, word in enumerate(units): numwords[word] = (1, idx)
for idx, word in enumerate(tens): numwords[word] = (1, idx * 10)
for idx, word in enumerate(scales): numwords[word] = (10 ** (idx * 3 or 2), 0)
textnum = textnum.replace('-', ' ')
current = result = 0
curstring = ''
onnumber = False
lastunit = False
lastscale = False
def is_numword(x):
if is_number(x):
return True
if word in numwords:
return True
return False
def from_numword(x):
if is_number(x):
scale = 0
increment = int(x.replace(',', ''))
return scale, increment
return numwords[x]
for word in textnum.split():
if word in ordinal_words:
scale, increment = (1, ordinal_words[word])
current = current * scale + increment
if scale > 100:
result += current
current = 0
onnumber = True
lastunit = False
lastscale = False
else:
for ending, replacement in ordinal_endings:
if word.endswith(ending):
word = "%s%s" % (word[:-len(ending)], replacement)
if (not is_numword(word)) or (word == 'and' and not lastscale):
if onnumber:
# Flush the current number we are building
curstring += repr(result + current) + " "
curstring += word + " "
result = current = 0
onnumber = False
lastunit = False
lastscale = False
else:
scale, increment = from_numword(word)
onnumber = True
if lastunit and (word not in scales):
# Assume this is part of a string of individual numbers to
# be flushed, such as a zipcode "one two three four five"
curstring += repr(result + current)
result = current = 0
if scale > 1:
current = max(1, current)
current = current * scale + increment
if scale > 100:
result += current
current = 0
lastscale = False
lastunit = False
if word in scales:
lastscale = True
elif word in units:
lastunit = True
if onnumber:
curstring += repr(result + current)
return curstring
Some tests...
one two three -> 123
three forty five -> 345
three and forty five -> 3 and 45
three hundred and forty five -> 345
three hundred -> 300
twenty five hundred -> 2500
three thousand and six -> 3006
three thousand six -> 3006
nineteenth -> 19
twentieth -> 20
first -> 1
my zip is one two three four five -> my zip is 12345
nineteen ninety six -> 1996
fifty-seventh -> 57
one million -> 1000000
first hundred -> 100
I will buy the first thousand -> I will buy the 1000 # probably should leave ordinal in the string
thousand -> 1000
hundred and six -> 106
1 million -> 1000000
Upvotes: 19
Reputation: 171
If anyone is interested, I hacked up a version that maintains the rest of the string (though it may have bugs, haven't tested it too much).
def text2int (textnum, numwords={}):
if not numwords:
units = [
"zero", "one", "two", "three", "four", "five", "six", "seven", "eight",
"nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen",
"sixteen", "seventeen", "eighteen", "nineteen",
]
tens = ["", "", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"]
scales = ["hundred", "thousand", "million", "billion", "trillion"]
numwords["and"] = (1, 0)
for idx, word in enumerate(units): numwords[word] = (1, idx)
for idx, word in enumerate(tens): numwords[word] = (1, idx * 10)
for idx, word in enumerate(scales): numwords[word] = (10 ** (idx * 3 or 2), 0)
ordinal_words = {'first':1, 'second':2, 'third':3, 'fifth':5, 'eighth':8, 'ninth':9, 'twelfth':12}
ordinal_endings = [('ieth', 'y'), ('th', '')]
textnum = textnum.replace('-', ' ')
current = result = 0
curstring = ""
onnumber = False
for word in textnum.split():
if word in ordinal_words:
scale, increment = (1, ordinal_words[word])
current = current * scale + increment
if scale > 100:
result += current
current = 0
onnumber = True
else:
for ending, replacement in ordinal_endings:
if word.endswith(ending):
word = "%s%s" % (word[:-len(ending)], replacement)
if word not in numwords:
if onnumber:
curstring += repr(result + current) + " "
curstring += word + " "
result = current = 0
onnumber = False
else:
scale, increment = numwords[word]
current = current * scale + increment
if scale > 100:
result += current
current = 0
onnumber = True
if onnumber:
curstring += repr(result + current)
return curstring
Example:
>>> text2int("I want fifty five hot dogs for two hundred dollars.")
I want 55 hot dogs for 200 dollars.
There could be issues if you have, say, "$200". But, this was really rough.
Upvotes: 17
Reputation: 7639
There's a ruby gem by Marc Burns that does it. I recently forked it to add support for years. You can call ruby code from python.
require 'numbers_in_words'
require 'numbers_in_words/duck_punch'
nums = ["fifteen sixteen", "eighty five sixteen", "nineteen ninety six",
"one hundred and seventy nine", "thirteen hundred", "nine thousand two hundred and ninety seven"]
nums.each {|n| p n; p n.in_numbers}
results:
"fifteen sixteen"
1516
"eighty five sixteen"
8516
"nineteen ninety six"
1996
"one hundred and seventy nine"
179
"thirteen hundred"
1300
"nine thousand two hundred and ninety seven"
9297
Upvotes: 1
Reputation: 3147
I have just released a python module to PyPI called word2number for the exact purpose. https://github.com/akshaynagpal/w2n
Install it using:
pip install word2number
make sure your pip is updated to the latest version.
Usage:
from word2number import w2n
print w2n.word_to_num("two million three thousand nine hundred and eighty four")
2003984
Upvotes: 44
Reputation: 6298
A quick solution is to use the inflect.py to generate a dictionary for translation.
inflect.py has a number_to_words()
function, that will turn a number (e.g. 2
) to it's word form (e.g. 'two'
). Unfortunately, its reverse (which would allow you to avoid the translation dictionary route) isn't offered. All the same, you can use that function to build the translation dictionary:
>>> import inflect
>>> p = inflect.engine()
>>> word_to_number_mapping = {}
>>>
>>> for i in range(1, 100):
... word_form = p.number_to_words(i) # 1 -> 'one'
... word_to_number_mapping[word_form] = i
...
>>> print word_to_number_mapping['one']
1
>>> print word_to_number_mapping['eleven']
11
>>> print word_to_number_mapping['forty-three']
43
If you're willing to commit some time, it might be possible to examine inflect.py's inner-workings of the number_to_words()
function and build your own code to do this dynamically (I haven't tried to do this).
Upvotes: 1
Reputation: 86124
The majority of this code is to set up the numwords dict, which is only done on the first call.
def text2int(textnum, numwords={}):
if not numwords:
units = [
"zero", "one", "two", "three", "four", "five", "six", "seven", "eight",
"nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen",
"sixteen", "seventeen", "eighteen", "nineteen",
]
tens = ["", "", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"]
scales = ["hundred", "thousand", "million", "billion", "trillion"]
numwords["and"] = (1, 0)
for idx, word in enumerate(units): numwords[word] = (1, idx)
for idx, word in enumerate(tens): numwords[word] = (1, idx * 10)
for idx, word in enumerate(scales): numwords[word] = (10 ** (idx * 3 or 2), 0)
current = result = 0
for word in textnum.split():
if word not in numwords:
raise Exception("Illegal word: " + word)
scale, increment = numwords[word]
current = current * scale + increment
if scale > 100:
result += current
current = 0
return result + current
print text2int("seven billion one hundred million thirty one thousand three hundred thirty seven")
#7100031337
Upvotes: 143
Reputation: 11
Made change so that text2int(scale) will return correct conversion. Eg, text2int("hundred") => 100.
import re
numwords = {}
def text2int(textnum):
if not numwords:
units = [ "zero", "one", "two", "three", "four", "five", "six",
"seven", "eight", "nine", "ten", "eleven", "twelve",
"thirteen", "fourteen", "fifteen", "sixteen", "seventeen",
"eighteen", "nineteen"]
tens = ["", "", "twenty", "thirty", "forty", "fifty", "sixty",
"seventy", "eighty", "ninety"]
scales = ["hundred", "thousand", "million", "billion", "trillion",
'quadrillion', 'quintillion', 'sexillion', 'septillion',
'octillion', 'nonillion', 'decillion' ]
numwords["and"] = (1, 0)
for idx, word in enumerate(units): numwords[word] = (1, idx)
for idx, word in enumerate(tens): numwords[word] = (1, idx * 10)
for idx, word in enumerate(scales): numwords[word] = (10 ** (idx * 3 or 2), 0)
ordinal_words = {'first':1, 'second':2, 'third':3, 'fifth':5,
'eighth':8, 'ninth':9, 'twelfth':12}
ordinal_endings = [('ieth', 'y'), ('th', '')]
current = result = 0
tokens = re.split(r"[\s-]+", textnum)
for word in tokens:
if word in ordinal_words:
scale, increment = (1, ordinal_words[word])
else:
for ending, replacement in ordinal_endings:
if word.endswith(ending):
word = "%s%s" % (word[:-len(ending)], replacement)
if word not in numwords:
raise Exception("Illegal word: " + word)
scale, increment = numwords[word]
if scale > 1:
current = max(1, current)
current = current * scale + increment
if scale > 100:
result += current
current = 0
return result + current
Upvotes: 1
Reputation: 6921
This could be easily be hardcoded into a dictionary if there's a limited amount of numbers you'd like to parse.
For slightly more complex cases, you'll probably want to generate this dictionary automatically, based on the relatively simple numbers grammar. Something along the lines of this (of course, generalized...)
for i in range(10):
myDict[30 + i] = "thirty-" + singleDigitsDict[i]
If you need something more extensive, then it looks like you'll need natural language processing tools. This article might be a good starting point.
Upvotes: 3
Reputation: 14080
Here's the trivial case approach:
>>> number = {'one':1,
... 'two':2,
... 'three':3,}
>>>
>>> number['two']
2
Or are you looking for something that can handle "twelve thousand, one hundred seventy-two"?
Upvotes: 6