Python 2.7 UnicodeEncode error

Question

I have a letter n with tilde (ñ) stored in a field in my database, and my Django application is giving some problems when trying to use it as a string.

When I access the value in the REPL it shows up like this:

>>> person.last_name
u'xxxxxxa\xf1oxxxx'
>>> str(person.last_name)
Traceback (most recent call last):
  File "", line 1, in 
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in     position 15: ordinal not in range(128)

Correct me if I'm wrong please - I'm thinking that it's a problem that the \xf1 string is contained inside of a Unicode string, and that it should've been handled differently prior to this value becoming a Unicode string... but I don't know if that's a symptom or the actual disease, as it were.

And so I'm not sure what to do about this. Am I perhaps storing this value incorrectly in the first place? Maybe I just need someone to show me how to decode it correctly? My goal is to write this value to a CSV, which ultimately involves running it through str(). Thanks very much!

spectras · Accepted Answer

Character ñ is unicode character LATIN SMALL LETTER N WITH TILDE (U+00F1). So the unicode string you see is correct. Python shows the escape \xf1, actually meaning in the context of a unicode string, character U+00F1.

There is nothing to decode, rather if you want to write that unicode string to some byte stream such as a file, you need to encode it.

The issue comes from taking str(foo) where foo is a unicode string. This is equivalent to foo.encode('ascii'). However, character ñ does not exist in ASCII encoding, thus the error you have.

Instead if you want a binary, encoded representation of your unicode string, you must know which encoding you want and encode manually:

>>> foo = u'xxxxxxa\xf1oxxxx'
>>> foo.encode('utf8')
'xxxxxxa\xc3\xb1oxxxx'
>>> foo.encode('latin1')
'xxxxxxa\xf1oxxxx'

Simply be sure to use the encoding of your CSV file, otherwise you'll have invalid characters.

The same will be true with python 3 btw, only your unicode string will be str type and your encoded string will be bytes type:

>>> foo = u'xxxxxxa\xf1oxxxx'  # note the u prefix is accepted for compatibility but has no effect
>>> foo.encode('utf8')
b'xxxxxxa\xc3\xb1oxxxx'
>>> foo.encode('latin1')
b'xxxxxxa\xf1oxxxx'

Python 2.7 UnicodeEncode error

Answers (2)

Related Questions