Reputation:
I'm requesting a string from a network-service. When I print it from within a program:
variable = getFromNetwork()
print(variable)
and I execute it using python3 net.py
I get:
\xd8\xaa\xd9\x85\xd9\x84\xd9\x8a612
When I execute in the python3 CLI:
>>> print("\xd8\xaa\xd9\x85\xd9\x84\xd9\x8a612")
تÙ
Ù
Ù612
Buy when I execute in the python2 CLI I get the correct result:
>>> print("\xd8\xaa\xd9\x85\xd9\x84\xd9\x8a612")
تملي612
How I can print this in my program by python3?
After executing the following line:
print(print(type(variable), repr(variable)))
Got
<class 'str'> '\\xd8\\xaa\\xd9\\x85\\xd9\\x84\\xd9\\x8a612'
I think I should first remove\\x
to make it hex and then decode it. What is your solutions!?
Upvotes: 4
Views: 947
Reputation: 650
in python 3 i tested with the following code
line='\xd8\xaa\xd9\x85\xd9\x84\xd9\x8a612'
line = line.encode('raw_unicode_escape')
line=line.decode("utf-8")
print(line)
it prints
تملي612
Upvotes: 0
Reputation: 148880
Your variable is a (unicode) string that contains code for a UTF8 encoded byte string. It can happen because it was erroneously decoded with a wrong encoding (probably Latin1 here).
You can fix it by first converting to a byte string without changing the codes (so with a Latin1 encoding) and then you will be able to correctly decode it:
variable = getFromNetwork().encode('Latin1').decode()
print(variable)
Demo:
variable = "\xd8\xaa\xd9\x85\xd9\x84\xd9\x8a612"
print(variable.encode('Latin1').decode())
تملي612
Upvotes: 2
Reputation: 18106
You need to specify the encoding, so the interpreter knows how to interpret the data:
s = "\xd8\xaa\xd9\x85\xd9\x84\xd9\x8a612"
y = s.encode('raw_unicode_escape')
print (y) # is a bytes object now!
print (y.decode('utf-8'))
Out:
b'\xd8\xaa\xd9\x85\xd9\x84\xd9\x8a612'
تملي612
Upvotes: 3