Django - POST data in latin1, decode as utf-8

Question

Using mysql (not my choice), everything is set to utf8, utf8_general_ci. In the normal case everything is utf8 and happy.

However, if I POST sth like É’s, some latin1, and save it into the database as normal, I can't call .decode('utf-8') on the resulting model field:

>>> myinstance.myfield.decode('utf-8')
...

UnicodeEncodeError: 'ascii' codec can't encode character u'\xc9' in position 7: ordinal not in range(128)

I want to clean all incoming data so that it can be decoded as utf8.

Trying an approach like this just causes the UnicodeEncodeError upfront.

Edit: As Daniel's answer suggests, this question comes from a misunderstanding. latin1 is not the culprit here. .decode('utf-8') tries to encode to ASCII, so, it will fail for unicode like u'팩맨'.decode('utf-8'). It pains me to leave this question up, knowing what I know now. But, maybe it will help someone. I think, since the data is actually coming back as unicode, what we were trying to do was actually equivalent to u'É’'.decode('utf-8').

Daniel Roseman · Accepted Answer

Django fields are always unicode. Trying to call decode on them means that Python will try to encode first, to ASCII, before trying to decode as UTF-8. That clearly isn't what you want. I expect you actually just want to do myinstance.myfield.encode('utf-8').

Django - POST data in latin1, decode as utf-8

Answers (1)

Related Questions