Reputation: 177
I connect to a mysql database using pymysql and after executing a request I got the following string: \xd0\xbc\xd0\xb0\xd1\x80\xd0\xba\xd0\xb0
.
This should be 5 characters in utf8, but when I do print s.encode('utf-8')
I get this: ╨╝╨░╤А╨║╨░
. The string looks like byte representation of unicode characters, which python fails to recognize.
So what do I do to make python process them properly?
Upvotes: 2
Views: 4168
Reputation: 837926
You want to decode
(not encode
) to get a unicode string from a byte string.
>>> s = '\xd0\xbc\xd0\xb0\xd1\x80\xd0\xba\xd0\xb0'
>>> us = s.decode('utf-8')
>>> print us
марка
Note that you may not be able to print
it because it contains characters outside ASCII. But you should be able to see its value in a Unicode-aware debugger. I ran the above in IDLE.
Update
It seems what you actually have is this:
>>> s = u'\xd0\xbc\xd0\xb0\xd1\x80\xd0\xba\xd0\xb0'
This is trickier because you first have to get those bytes into a bytestring before you call decode
. I'm not sure what the "best" way to do that is, but this works:
>>> us = ''.join(chr(ord(c)) for c in s).decode('utf-8')
>>> print us
марка
Note that you should of course be decoding it before you store it in the database as a string.
Upvotes: 5
Reputation: 375484
Mark is right: you need to decode the string. Byte strings become Unicode strings by decoding them, encoding goes the other way. This and many other details are at Pragmatic Unicode, or, How Do I Stop The Pain?.
Upvotes: 4