Suresh M
Suresh M

Reputation: 29

TypeError: character mapping must return integer, None or unicode

I am not able to figure out what this error is and how do I fix it.

    texts = [[word for word in document.translate(trans_table).lower().split()] for document in live_text]

TypeError: character mapping must return integer, None or unicode

My code:

rows=cursor.fetchall()
listTSeps=[]

for row in rows:
    listTSeps.append(re.sub('[^A-Za-z0-9]+', ' ', row[0]))

#Close cursor and connection done reading from database
cursor.close()
conn.close()


live_text=listTSeps
trans_table = ''.join( [chr(i) for i in range(128)] + [' '] * 128 )

texts = [[word for word in document.translate(trans_table).lower().split()] for document in live_text]

text_matrix = ["None"]*len(live_text)

My search through web concluded that this can be solved using .encode('ascii') or ord().

I am an amateur with python and trying to learn from sample codes. I came across this from a friend. Could somebody please be kind enough to explain the source of problem and how do I fix it. Thanks.

Upvotes: 0

Views: 5918

Answers (1)

Alfe
Alfe

Reputation: 59516

Your document is a unicode, not a str. For unicode the translate() method needs to be something different, not a 256-character string.

help(u' '.translate)

yields:

Help on built-in function translate:

translate(...)
    S.translate(table) -> unicode

    Return a copy of the string S, where all characters have been mapped
    through the given translation table, which must be a mapping of
    Unicode ordinals to Unicode ordinals, Unicode strings or None.
    Unmapped characters are left untouched. Characters mapped to None
    are deleted.

A dictionary like this is fine:

u'abcd efgh'.translate({ 32: u'x' })
u'abcdxefgh'

For your case where you just want to replace all characters above ASCII 127 with a space, you might want to consider this:

re.sub(r'[^\x00-\x7f]', ' ', u'abcdäefgh')
u'abcd efgh'

Upvotes: 2

Related Questions