Reputation: 2269
I have a letter n with tilde (ñ) stored in a field in my database, and my Django application is giving some problems when trying to use it as a string.
When I access the value in the REPL it shows up like this:
>>> person.last_name
u'xxxxxxa\xf1oxxxx'
>>> str(person.last_name)
Traceback (most recent call last):
File "<console>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in position 15: ordinal not in range(128)
Correct me if I'm wrong please - I'm thinking that it's a problem that the \xf1
string is contained inside of a Unicode string, and that it should've been handled differently prior to this value becoming a Unicode string... but I don't know if that's a symptom or the actual disease, as it were.
And so I'm not sure what to do about this. Am I perhaps storing this value incorrectly in the first place? Maybe I just need someone to show me how to decode it correctly? My goal is to write this value to a CSV, which ultimately involves running it through str()
. Thanks very much!
Upvotes: 1
Views: 571
Reputation: 1554
You can use simple python encode
function to convert unicode
to str
. Second parameter ignore
is give to ignore character which python can't encode in that particular format.
In [1]: foo = u'xxxxxxa\xf1oxxxx'
In [2]: foo.encode('ascii', 'ignore')
Out[2]: 'xxxxxxaoxxxx'
In [3]: foo.encode('utf-8', 'ignore')
Out[3]: 'xxxxxxa\xc3\xb1oxxxx'
Upvotes: 1
Reputation: 13562
Character ñ is unicode character LATIN SMALL LETTER N WITH TILDE
(U+00F1). So the unicode string you see is correct. Python shows the escape \xf1, actually meaning in the context of a unicode string, character U+00F1.
There is nothing to decode, rather if you want to write that unicode string to some byte stream such as a file, you need to encode it.
The issue comes from taking str(foo)
where foo
is a unicode string. This is equivalent to foo.encode('ascii')
. However, character ñ does not exist in ASCII encoding, thus the error you have.
Instead if you want a binary, encoded representation of your unicode string, you must know which encoding you want and encode manually:
>>> foo = u'xxxxxxa\xf1oxxxx'
>>> foo.encode('utf8')
'xxxxxxa\xc3\xb1oxxxx'
>>> foo.encode('latin1')
'xxxxxxa\xf1oxxxx'
Simply be sure to use the encoding of your CSV file, otherwise you'll have invalid characters.
The same will be true with python 3 btw, only your unicode string will be str
type and your encoded string will be bytes
type:
>>> foo = u'xxxxxxa\xf1oxxxx' # note the u prefix is accepted for compatibility but has no effect
>>> foo.encode('utf8')
b'xxxxxxa\xc3\xb1oxxxx'
>>> foo.encode('latin1')
b'xxxxxxa\xf1oxxxx'
Upvotes: 3