A. Vaicenavicius
A. Vaicenavicius

Reputation: 97

How to read binary strings in Python

I am wondering how is binary encoding for a string is given in Python.

For example,

>>> b'\x25'
b'%'

or

>>>b'\xe2\x82\xac'.decode()
'€'

but

>>> b'\xy9'
File "<stdin>", line 1
SyntaxError: (value error) invalid \x escape at position 0

Please, could you explain what \xe2 stand for and how this binary encoding works.

Upvotes: 1

Views: 1256

Answers (2)

Serge Ballesta
Serge Ballesta

Reputation: 149155

I assume that you use a Python 3 version. In Python 3 the default encoding is UTF-8, so b'\xe2\x82\xac'.decode() is in fact b'\xe2\x82\xac'.decode('UTF-8).

It gives the character '€' which is U+20AC in unicode and the UTF8 encoding of U+20AC is indeed `b'\xe2\x82\xac' in 3 bytes.

So all ascii characters (code below 128) are encoded into one single byte with same value as the unicode code. Non ascii characters corresponding to one single 16 bit unicode value are utf8 encoded into 2 or 3 bytes (this is known as the Basic Multilingual Plane).

Upvotes: 1

chepner
chepner

Reputation: 532238

\x is used to introduce a hexadecimal value, and must be followed by exactly two hexadecimal digits. For example, \xe2 represents the byte (in decimal) 226 (= 14 * 16 + 2).

In the first case, the two strings b'\x25' and b'%' are identical; Python displays values using ASCII equivalents where possible.

Upvotes: 3

Related Questions