minks
minks

Reputation: 3029

Converting Numbers to Words

I have a list of list of strings from which I want to convert numbers into text equivalents. eg. 2 to two

This is what result looks like:

[
    ['nn', 'known', 'tsutsumi', 'father', 'yasujiro', 'sow', 'seed', 'family', 'dominion'],
    ['un', 'secretari', 'gener', 'kofi', 'annan', 'appoint', 'special', 'repres', 'iraq', 'help', 'improv', 'commun', 'iraqi', 'leader'],
    ['year', '2016']
]

Here is my code:

from num2words import num2words

result=[]
with open("./Stemmingg.txt") as filer:
    for line in filer:
        result.append(line.strip().split())

temp=[]

for item in result:
    r=num2words(item)
    temp.append(r)

However, this gives me an error which says:

TypeError: type(['nn', 'known', 'tsutsumi', 'father', 'yasujiro', 'sow', 'seed', 'family', 'dominion']) not in [long, int, float]

Upvotes: 1

Views: 1012

Answers (3)

Learner
Learner

Reputation: 5292

Firstly try to create a list result that is flattened i.e. no nested list inside it if any. Then use evaluation of the list item if it is number (int or long using isdigit() function) and use literal_eval before passing to the function num2words since num2words expects int not str .

from num2words import num2words
from ast import literal_eval

result = []
with open("/Users/mr/Documents/Stemmingg.txt",'r') as filer:
    for line in filer:
        lst = line.strip().split()#split every line by spaces
        for item in lst:
            result.append(item.strip())#creating flattened list by appending item one by one

temp=[]     
for item in result:
    if item.isdigit():#check if int of long but not float
        r=num2words(literal_eval(item))#using literal_eval to convert string to number
        temp.append(r)
    else:
        pass
print temp

N.B.If you want to keep every other words then change

This

else:
       pass 

To

else:
      temp.append(item)

Upvotes: 2

Will
Will

Reputation: 24699

You have a list of lists, not a list of strs. This would be a naive approach:

from num2words import num2words
result=[]
with open("/Users/mr/Documents/Stemmingg.txt") as filer:
    for line in filer:
        result.append(line.strip().split())

result = [[
    num2words(subitem) if isinstance(subitem, (int, float, long)) else subitem for subitem in item
] for item in result]

This is a nested list comprehension; see here for more information about how those work.

Now, this still has a problem! If I have the string '22', our isinstance() check fails! So we might need some additional logic, with the help of isdigit():

def digitsToWords(item):
    if isinstance(item, (int, float, long)):
        return num2words(item)

    if isinstance(item, (str, unicode)):
        if item.isdigit():
            return num2words(int(item))

        if item.replace('.', '', 1).isdigit():
            return num2words(float(item))

    return item

result = [[digitsToWords(subitem) for subitem in item] for item in result]

If you don't want to convert floats to words, do this instead:

def digitsToWords(item):
    if isinstance(item, (int, long)):
        return num2words(item)

    if isinstance(item, (str, unicode)) and item.isdigit():
        return num2words(int(item))

    return item

result = [[digitsToWords(subitem) for subitem in item] for item in result]

Upvotes: 2

Schiem
Schiem

Reputation: 589

The reason for that specific error is because your array of results is actually an array of arrays.

So saying something like

for item in result:
    r=num2words(item)

item will actually be

['nn', 'known', 'tsutsumi', 'father', 'yasujiro', 'sow', 'seed', 'family', 'dominion']

Your options for that are either to flatten it into a single dimensional array or to have a nested for loop, like so (or use a nested list comprehension, as answered above):

for arr in result:
    for item in arr: 
        r=num2words(item)

However, you still have a problem--num2words must take a number. None of your items are actually numbers (they're all strings). Since you're parsing from a file, you should probably try to cast to an int, and only convert it if it works. So the code would look something like:

from num2words import num2words
result=[]
with open("/Users/mr/Documents/Stemmingg.txt") as filer:
    for line in filer:
        result.append(line.strip().split())

temp=[]
for arr in result:
    for item in arr: 
        try:
            r=num2words(int(item))
            temp.append(r)
        except:
            pass

Upvotes: -1

Related Questions