Reputation: 3029
I have a list of list of strings from which I want to convert numbers into text equivalents. eg. 2 to two
This is what result looks like:
[
['nn', 'known', 'tsutsumi', 'father', 'yasujiro', 'sow', 'seed', 'family', 'dominion'],
['un', 'secretari', 'gener', 'kofi', 'annan', 'appoint', 'special', 'repres', 'iraq', 'help', 'improv', 'commun', 'iraqi', 'leader'],
['year', '2016']
]
Here is my code:
from num2words import num2words
result=[]
with open("./Stemmingg.txt") as filer:
for line in filer:
result.append(line.strip().split())
temp=[]
for item in result:
r=num2words(item)
temp.append(r)
However, this gives me an error which says:
TypeError: type(['nn', 'known', 'tsutsumi', 'father', 'yasujiro', 'sow', 'seed', 'family', 'dominion']) not in [long, int, float]
Upvotes: 1
Views: 1012
Reputation: 5292
Firstly try to create a list result
that is flattened i.e. no nested list inside it if any. Then use evaluation of the list item if it is number (int
or long
using isdigit()
function) and use literal_eval
before passing to the function num2words
since num2words
expects int
not str
.
from num2words import num2words
from ast import literal_eval
result = []
with open("/Users/mr/Documents/Stemmingg.txt",'r') as filer:
for line in filer:
lst = line.strip().split()#split every line by spaces
for item in lst:
result.append(item.strip())#creating flattened list by appending item one by one
temp=[]
for item in result:
if item.isdigit():#check if int of long but not float
r=num2words(literal_eval(item))#using literal_eval to convert string to number
temp.append(r)
else:
pass
print temp
N.B.If you want to keep every other words then change
This
else:
pass
To
else:
temp.append(item)
Upvotes: 2
Reputation: 24699
You have a list
of list
s, not a list of str
s. This would be a naive approach:
from num2words import num2words
result=[]
with open("/Users/mr/Documents/Stemmingg.txt") as filer:
for line in filer:
result.append(line.strip().split())
result = [[
num2words(subitem) if isinstance(subitem, (int, float, long)) else subitem for subitem in item
] for item in result]
This is a nested list comprehension; see here for more information about how those work.
Now, this still has a problem! If I have the string '22'
, our isinstance()
check fails! So we might need some additional logic, with the help of isdigit()
:
def digitsToWords(item):
if isinstance(item, (int, float, long)):
return num2words(item)
if isinstance(item, (str, unicode)):
if item.isdigit():
return num2words(int(item))
if item.replace('.', '', 1).isdigit():
return num2words(float(item))
return item
result = [[digitsToWords(subitem) for subitem in item] for item in result]
If you don't want to convert float
s to words, do this instead:
def digitsToWords(item):
if isinstance(item, (int, long)):
return num2words(item)
if isinstance(item, (str, unicode)) and item.isdigit():
return num2words(int(item))
return item
result = [[digitsToWords(subitem) for subitem in item] for item in result]
Upvotes: 2
Reputation: 589
The reason for that specific error is because your array of results is actually an array of arrays.
So saying something like
for item in result:
r=num2words(item)
item will actually be
['nn', 'known', 'tsutsumi', 'father', 'yasujiro', 'sow', 'seed', 'family', 'dominion']
Your options for that are either to flatten it into a single dimensional array or to have a nested for loop, like so (or use a nested list comprehension, as answered above):
for arr in result:
for item in arr:
r=num2words(item)
However, you still have a problem--num2words must take a number. None of your items are actually numbers (they're all strings). Since you're parsing from a file, you should probably try to cast to an int, and only convert it if it works. So the code would look something like:
from num2words import num2words
result=[]
with open("/Users/mr/Documents/Stemmingg.txt") as filer:
for line in filer:
result.append(line.strip().split())
temp=[]
for arr in result:
for item in arr:
try:
r=num2words(int(item))
temp.append(r)
except:
pass
Upvotes: -1