Reputation: 29
I am not able to figure out what this error is and how do I fix it.
texts = [[word for word in document.translate(trans_table).lower().split()] for document in live_text]
TypeError: character mapping must return integer, None or unicode
My code:
rows=cursor.fetchall()
listTSeps=[]
for row in rows:
listTSeps.append(re.sub('[^A-Za-z0-9]+', ' ', row[0]))
#Close cursor and connection done reading from database
cursor.close()
conn.close()
live_text=listTSeps
trans_table = ''.join( [chr(i) for i in range(128)] + [' '] * 128 )
texts = [[word for word in document.translate(trans_table).lower().split()] for document in live_text]
text_matrix = ["None"]*len(live_text)
My search through web concluded that this can be solved using .encode('ascii') or ord().
I am an amateur with python and trying to learn from sample codes. I came across this from a friend. Could somebody please be kind enough to explain the source of problem and how do I fix it. Thanks.
Upvotes: 0
Views: 5918
Reputation: 59516
Your document
is a unicode
, not a str
. For unicode
the translate()
method needs to be something different, not a 256-character string.
help(u' '.translate)
yields:
Help on built-in function translate:
translate(...)
S.translate(table) -> unicode
Return a copy of the string S, where all characters have been mapped
through the given translation table, which must be a mapping of
Unicode ordinals to Unicode ordinals, Unicode strings or None.
Unmapped characters are left untouched. Characters mapped to None
are deleted.
A dictionary like this is fine:
u'abcd efgh'.translate({ 32: u'x' })
u'abcdxefgh'
For your case where you just want to replace all characters above ASCII 127 with a space, you might want to consider this:
re.sub(r'[^\x00-\x7f]', ' ', u'abcdäefgh')
u'abcd efgh'
Upvotes: 2