user1866080
user1866080

Reputation: 35

How to print german diacritic characters in Python 2.7?

Working on a german words (sometimes containing Umlaut characters) in an Excel2007 spreadsheet (I use xlrd xlwt and openpyxl), I get the following value:

var = str(ws.cell(row=i+k,column=0).value).encode('latin-1')

I get with print(var):

'[a word')

until coming on a word containing Umlaut characters, when I get:

Traceback (most recent call last):
  File "C:\Users\cristina\Documents\horia\Linguistics3\px t3.py", line 68, in <module>
    var = str(ws4.cell(row=i+k,column=0).value).encode('latin-1')
UnicodeEncodeError: 'ascii' codec can't encode character u'\xdf' in position 3:ordinal not in range(128)

And the program stops.

If I define var as:

var = u'str(ws4.cell(row=i+k,column=0).value)'.encode('latin-1')

I get, when hen trying to print(var), I get:

var=str(ws.cell(row=i+k,column=0).value)

The program runs normally until the end

I can get the value of var in Python Shell, but not by "print(var)" in the program.

Can anybody give me a solution?

Upvotes: 0

Views: 5391

Answers (1)

jsbueno
jsbueno

Reputation: 110476

First of all, read this: http://www.joelonsoftware.com/articles/Unicode.html (seriously)

Then, understand that Python2 has two distinct data-types: unicode, for "agnostic" handing all possible characters, but which can nt be used in input/output, such as "print" or writing to files, without being encoded into the other data type: strings.

Strings are encoding-dependent.

What I am almost sure is going on there, given your error message, is that the ws4.cell(row=i+k,column=0).value call is returning you a unicode value. (I can't test it on my non-windows environment here) - to be sure instead of guess work, you may want to run things there once with print (type(ws4.cell(row=i+k,column=0).value) just to assert you are getting unicode values.

Thus, when you do str(ws4.(...).value) you are telling Python to just convert unicode to str without any encoding - that is the call that raises your error, not the subsequent "decode" call.

If that is what is going on, simply replace that str call for unicode:

var = u'str(ws4.cell(row=i+k,column=0).value)'.encode('latin-1') 

That should fix your problem. I hope you've read the article I linked above - it is helpful.

Also, mark your Python source code with the corresponding encoding you are using - otherwise you will get an error on any non-ASCII char in your source code.

For example, write this on the very first line of your code:

# coding: latin1

(Although for any serious project you should be using utf-8 instead.)

Upvotes: 2

Related Questions