PeterMmm
PeterMmm

Reputation: 24630

Convert Excel value from code page 1251 to unicode

I'm accesing an Excel thru python to adjust some encoding of the cells. My code so far:

from xlrd import *
from xlwt import *

wb = open_workbook('a.xls')

s = wb.sheets()[0]

for row in range(s.nrows):
e = s.cell(row,9).value
r = s.cell(row,11).value
print e,' ',r.decode('cp1251')

When running this code I get this error:

Traceback (most recent call last):
  File "C:\Users\pem\workspace\a\src\a.py", line 17, in <module>
    print e,' ',r.decode('cp1251')
  File "C:\Python27\lib\encodings\cp1251.py", line 15, in decode
    return codecs.charmap_decode(input,errors,decoding_table)
  File "C:\Python27\lib\encodings\cp1251.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode character u'\xf6' in position 23: character maps to <undefined>

I know that e is english text and r is the russian translation in 1251 encoding.

Upvotes: 0

Views: 1575

Answers (1)

Kos
Kos

Reputation: 72261

I assume you're using Python 2. (Unicode handling is different in Python 3.)

Use r.decode('cp1252') to decode r in your encoding into unicode. This will give you an object of type unicode.

Note that if you try to print it, it will be first implicitly encoded, i.e. converted back to str with default encoding ansi. If your console supports unicode, you can print it by saying:

print xyz.encode('utf-8')

Note that Python's str string consists of 8-bit bytes (characters), while unicode represents an actual string where one character can be any unicode character. (In Python 3, str was replaced by bytes and unicode renamed to str to make this more obvious.)

.decode() on a str allows you to get a "meaningful" unicode string out of some bytes (that you read from somewhere) using an encoding you specify, while .decode() on an unicode object does the opposite: allows you to get the byte representation of a string using an encoding of your choice.

Upvotes: 2

Related Questions