Vadim Tikanov
Vadim Tikanov

Reputation: 671

"UnicodeEncodeError: 'charmap' codec can't encode characters" when trying to parse .xlsx by openpyxl

--- update ---

I think this console log nails the issue, however it's still not clear how to fix it:

>>> workbook = openpyxl.load_workbook('data.xlsx')
>>> worksheet = workbook.active
>>> worksheet['A2'].value
u'\u041c\u0435\u0448\u043e\u043a \u0434\u0435\u043d\u0435\u0433'
>>> print worksheet['A2'].value
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-4: character maps to <undefined>

--- end update ---

I'm trying to print the values of some .xlsx cells using openpyxl:

import openpyxl
workbook = openpyxl.load_workbook(filename='puzzles.xlsx')
worksheet = workbook.active
for row in worksheet.iter_rows('A2:K5'):
    print row[0].value

Which results in the following error:

Traceback (most recent call last):
  File "xls_import.py", line 8, in <module>
    print row[0].value
  File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-4: character maps to <undefined>

As far as I know, XLSX is encoded as UTF-8, however:

print row[0].value.decode('utf-8')

does not help either:

Traceback (most recent call last):
  File "xls_import.py", line 8, in <module>
    print row[0].value.decode('utf-8')
  File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128)

Any suggestions?

I'm running Python 2.7 and openpyxl 2.2.5.

Upvotes: 1

Views: 2658

Answers (1)

Charlie Clark
Charlie Clark

Reputation: 19557

openpyxl returns unicode strings (XML itself is encoded in UTF-8) so you don't need to decode them (decoding goes from an encoding to unicode) but encode them in encoding of your choice.

Upvotes: 1

Related Questions