3xpl017
3xpl017

Reputation: 37

Python3 handling non-ASCII characters in a weird way

I was trying to solve a pwnable with Python 3. For that I need to print some characters that are not in the ASCII range.

Python 3 is converting these characters into some weird Unicode.

For example if I print "\xff" in Python 3, I get this:

root@kali:~# python3 -c 'print("\xff")' | xxd
00000000: c3bf 0a                                  ...

\xff gets converted to \xc3\xbf

But in Python 2 it works as expected, like this:

root@kali:~# python -c 'print("\xff")' | xxd
00000000: ff0a                                     ..

So how can print it like that in Python 3?

Upvotes: 2

Views: 1022

Answers (2)

Mad Physicist
Mad Physicist

Reputation: 114310

In Python 2, str and bytes were the same thing, so when you wrote '\xff', the result contained the actual byte 0xFF.

In Python 3, str is closer to Python 2's unicode object, and is not an alias for bytes. \xff is no longer a request to insert a byte, but rather a request to insert a Unicode character whose code can be represented in 8 bits. The string is printed with your default encoding (probably UTF-8), in which character 0xFF is encoded as the bytes \xc3\xbf. \x is basically the one-byte version of \u when it appears in a string. It's still the same thing as before when it appears in a bytes though.

Now for a solution. If you just want some bytes, do

b'\xff'

That will work the same as in Python 2. You can write these bytes to a binary file, but you can't then print directly, since everything you print gets converted to str. The problem with printing is that everything gets encoded in text mode. Luckily, sys.stdout has a buffer attribute that lets you output bytes directly:

sys.stdout.buffer.write(b'\xff\n')

This will only work if you don't replace sys.stdout with something fancy that doesn't have a buffer.

Upvotes: 3

Mark Tolonen
Mark Tolonen

Reputation: 177674

In Python 2, print '\xff' writes a bytes string directly to the terminal, so you get the byte you print.

In Python 3, print('\xff') encodes the Unicode character U+00FF to the terminal using the default encoding...in your case UTF-8.

To directly output bytes to the terminal in Python 3 you can't use print, but you can use the following to skip encoding and write a byte string:

python3 -c "import sys; sys.stdout.buffer.write(b'\xff')"

Upvotes: 4

Related Questions