Reputation: 35
Working on a german words (sometimes containing Umlaut characters) in an Excel2007 spreadsheet (I use xlrd xlwt and openpyxl), I get the following value:
var = str(ws.cell(row=i+k,column=0).value).encode('latin-1')
I get with print(var):
'[a word')
until coming on a word containing Umlaut characters, when I get:
Traceback (most recent call last):
File "C:\Users\cristina\Documents\horia\Linguistics3\px t3.py", line 68, in <module>
var = str(ws4.cell(row=i+k,column=0).value).encode('latin-1')
UnicodeEncodeError: 'ascii' codec can't encode character u'\xdf' in position 3:ordinal not in range(128)
And the program stops.
If I define var as:
var = u'str(ws4.cell(row=i+k,column=0).value)'.encode('latin-1')
I get, when hen trying to print(var), I get:
var=str(ws.cell(row=i+k,column=0).value)
The program runs normally until the end
I can get the value of var in Python Shell, but not by "print(var)" in the program.
Can anybody give me a solution?
Upvotes: 0
Views: 5391
Reputation: 110476
First of all, read this: http://www.joelonsoftware.com/articles/Unicode.html (seriously)
Then, understand that Python2 has two distinct data-types: unicode, for "agnostic" handing all possible characters, but which can nt be used in input/output, such as "print" or writing to files, without being encoded into the other data type: strings.
Strings are encoding-dependent.
What I am almost sure is going on there, given your error message, is that the ws4.cell(row=i+k,column=0).value
call is returning you a unicode value. (I can't test it on my non-windows environment here) - to be sure instead of guess work, you may want to run things there once with
print (type(ws4.cell(row=i+k,column=0).value)
just to assert you are getting unicode values.
Thus, when you do str(ws4.(...).value)
you are telling Python to just convert unicode to str without any encoding - that is the call that raises your error, not the subsequent "decode" call.
If that is what is going on, simply replace that str
call for unicode
:
var = u'str(ws4.cell(row=i+k,column=0).value)'.encode('latin-1')
That should fix your problem. I hope you've read the article I linked above - it is helpful.
Also, mark your Python source code with the corresponding encoding you are using - otherwise you will get an error on any non-ASCII char in your source code.
For example, write this on the very first line of your code:
# coding: latin1
(Although for any serious project you should be using utf-8 instead.)
Upvotes: 2