sinkmanu
sinkmanu

Reputation: 1102

Printing non-ASCII characters in python3

The way as python2 and python3 handtle the strings and the bytes are different, thus printing a hex string which contains non-ASCII characters in Python3 is different to Python2 does.

Why does it happens and how could I print something in Python3 like Python2 does? (With ASCII characters or UTF-8 it works well if you decode the bytes string)

Python3:

$ python3 -c 'print("\x41\xb3\xde\x41\x42\x43\xad\xde")' |xxd -p
41c2b3c39e414243c2adc39e0a

Python2:

$ python2 -c 'print "\x41\xb3\xde\x41\x42\x43\xad\xde"' |xxd -p
41b3de414243adde0a

\x0a is newline because print adds it.

How could I print "\xb3" in python3? It adds "\xc2\xb3" instead just "\xb3".

$ python3 -c 'print("\xb3")' |xxd
00000000: c2b3 0a                                  ...
$ python2 -c 'print "\xb3"' |xxd
00000000: b30a                                     ..

Upvotes: 0

Views: 1166

Answers (2)

HTF
HTF

Reputation: 7260

I'm not sure if this will help in your case but you can use sys.stdout.buffer:

Note

To write or read binary data from/to the standard streams, use the underlying binary buffer object. For example, to write bytes to stdout, use sys.stdout.buffer.write(b'abc').

$ python3 -c 'import sys; sys.stdout.buffer.write(b"\x41\xb3\xde\x41\x42\x43\xad\xde")' | hexdump -C
00000000  41 b3 de 41 42 43 ad de                           |A..ABC..|
00000008

Please also note that there is no new line character now that was added by print function.

Upvotes: 2

MisterMiyagi
MisterMiyagi

Reputation: 50076

The underlying problem is that in Python3 str is for encoded strings, and likewise print only handles str and thus always enforces some encoding.

To write binary data, directly write bytes to the underlying binary pipe of stdout:

python3 -c 'import sys; sys.stdout.buffer.write(b"\x41\xb3\xde\x41\x42\x43\xad\xde")' | xxd -p
41b3de414243adde

Note that the final 0a is missing because .write adds no newline. Manually add it if it is desired.

In case the data already exists as a string, the latin1 encoding can be used to get equivalent bytes:

python3 -c '
import sys
sys.stdout.buffer.write("\x41\xb3\xde\x41\x42\x43\xad\xde".encode("latin1"))' | xxd -p
41b3de414243adde

Upvotes: 2

Related Questions