Reputation: 11464
Using mysql (not my choice), everything is set to utf8
, utf8_general_ci
. In the normal case everything is utf8
and happy.
However, if I POST sth like É’s
, some latin1
, and save it into the database as normal, I can't call .decode('utf-8')
on the resulting model field:
>>> myinstance.myfield.decode('utf-8')
...
UnicodeEncodeError: 'ascii' codec can't encode character u'\xc9' in position 7: ordinal not in range(128)
I want to clean all incoming data so that it can be decoded as utf8
.
Trying an approach like this just causes the UnicodeEncodeError
upfront.
Edit: As Daniel's answer suggests, this question comes from a misunderstanding. latin1
is not the culprit here. .decode('utf-8')
tries to encode to ASCII, so, it will fail for unicode like u'팩맨'.decode('utf-8')
. It pains me to leave this question up, knowing what I know now. But, maybe it will help someone. I think, since the data is actually coming back as unicode, what we were trying to do was actually equivalent to u'É’'.decode('utf-8')
.
Upvotes: 1
Views: 2078
Reputation: 599590
Django fields are always unicode. Trying to call decode
on them means that Python will try to encode first, to ASCII, before trying to decode as UTF-8. That clearly isn't what you want. I expect you actually just want to do myinstance.myfield.encode('utf-8')
.
Upvotes: 1