Reputation: 376
When I output some Chinese character in Python (Pandas), it shows as below
\xe8\xbf\x99\xe7\xa7\x8d\xe6\x83\x85\xe5\x86\xb5\xe6\x98\xaf\xe6\xb2\xb9\xe6\xb3\xb5\xe6\x95\x85\xe9\x9a\x9c\xe7\x81\xaf\xef\xbc\x8c\xe6\xa3\x80\xe6\x9f\xa5\xe4\xb8\x80\xe4\xb8\x8b\xe6\xb2\xb9\xe6\xb3\xb5\xe6\x8f\x92\xe5\xa4\xb4\xe6\x98\xaf\xe5\x90\xa6\xe6\x8e\xa5\xe8\x99\x9a\xef\xbc\x8c\xe7\x84\xb6\xe5\x90\x8e\xe6\x9f\xa5\xe4\xb8\x80\xe4\xb8\x8b\xe6\xb2\xb9\xe6\xb3\xb5\xe5\x86\x85\xe7\xae\xa1\xe9\x81\x93\xe5\x8e\x8b\xe5\x8a\x9b\xe6\x98\xaf\xe5\x90\xa6\xe7\xac\xa6\xe5\x90\x88\xe6\xad\xa3\xe5\xb8\xb8\xe5\x80\xbc\xe3\x80\x82
What is the encoding format? It is not unicode as I know. Thanks!
Upvotes: 0
Views: 10877
Reputation: 2093
The output you are receiving is called a bytes object. In order to decode it, you need to do output.decode('utf-8')
.
For example:
output = b'\xe8\xbf\x99\xe7...'
unicode_output = output.decode('utf-8')
print(unicode_output)
would then output non-latin characters (I cannot include it because it counts as spam).
Another way to do this in one-line would be:
print(b'\xe8\xbf\x99\xe7...'.decode('utf-8'))
.
However, if that doesn't work, then it is probably because of the fact that your output isn't a bytes object, but is contained within a string. If that does not work, then there is another solution.
output = '\xe8\xbf\x99\xe7...'
exec('print(b\''+ output + '\'.decode(\'utf-8\'))')
That should be able to fix it. Hope you got something useful out of this. Have a good day!
Upvotes: 1
Reputation: 792
raw_bytes = b'\xe8\xbf\x99\xe7\xa7\x8d\xe6\x83\x85 . . .'
with raw_bytes
a <class 'bytes'>
object containing your hexadecimal characters you can then call decode
on raw_bytes
and get a <class 'str'>
representation of your characters.
string_text = raw_bytes.decode("utf-8")
Upvotes: 0
Reputation: 13495
This is bytes
type, containing a valid utf-8 Chinese text (as far as I can trust Google Translate).
If it's a string literal from your code, add # -*- coding: utf-8 -*-
as the first line of your Python file.
If it's an external data, here's how to convert it to a text (str
type): bytes_text.decode("utf-8")
Upvotes: 0