Reputation: 987
I have a small question on string conversion in python3.
s = '\x001\x002\x001\x000\x005\x005\x000\x004\x000\x000\x00'
print(s) -> gives the output :
1 2 1 0 5 5 0 4 0 0
However, when I try to convert the string using the following:
bytes(s, 'utf16').decode('utf16')
, I get '\x001\x002\x001\x000\x005\x005\x000\x004\x000\x000\x00'.
What is the way to get the same output as print(s) programmatically?
Upvotes: 1
Views: 336
Reputation: 9533
On first example, you print the string s
, and console will ignore the \x00
. You do a print(s)
.
On you last line, you get the string from python prompt. If you print it: print(bytes(s,'utf-16').decode('utf-16'))
, you get what you want.
So Python prompt show you to the variable, with context (e.g. you see also the '
signs), but not the real representation of the string (which do you have with print
).
ADDENDUM:
print
will print the string in its argument, eventually calling str()
to convert the argument to string. But python prompt will print the representation of the variable (given with repr()
. So you can print(repr(bytes(s,'utf-16').decode('utf-16')))
to get the same string you get in python interactive session, but as string. Instead of printing, you can assign such function (r = repr(bytes(...).decode(...))
, so you have r[0]
is '
, r[1]
is \
, etc.
Upvotes: 2
Reputation: 1547
You just need to decode this binary and you will get the answer
x = b'\x001\x002\x001\x000\x005\x005\x000\x004\x000\x000\x00'
str1 = x.decode('utf-8')
print(" ".join([i for i in str1 if ord(i) != 0]))
Second Solution:
x = '1 2 1 0 5 5 0 4 0 0'
str_utf32 = x.encode('utf16')
print("Encoding :",str_utf32)
print("Decoding :",str_utf32.decode('utf16'))
output
Encoding : b'\xff\xfe1\x00 \x002\x00 \x001\x00 \x000\x00 \x005\x00 \x005\x00 \x000\x00 \x004\x00 \x000\x00 \x000\x00'
Decoding : 1 2 1 0 5 5 0 4 0 0
Upvotes: 1