Confusing about python byte literal

Question

In python3, what I know about byte literal is that

whenever a single byte can be represented by printable ASCII character, it will be shown as this character. e.g b'\x63\xf8' will be b'c\xf8'
Recently, I found that there is some strange property. It seems that number (if each digit less than 8 and length of all digits <= 3) after \ will be seen as octal, e.g b'\123' == b'S' but it seems largest is b'\377' == b'\xff'. However, b'\474' == b'<', b'\574' == b'|'

Can anyone explain strange behavior above second property? What is general rule for byte with form of some number following \?

9769953 · Accepted Answer

This is a bug in the CPython implementation.

The documentation about bytes states the following:

While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers, with each value in the sequence restricted such that 0 <= x < 256 (attempts to violate this restriction will trigger ValueError).

Technically, this mentions bytes objects, not bytes literals, so the extrapolation could be seen as invalid, that is, b'\407' does not have to behave the same as bytes((0o407,)) (and indeed, it doesn't currently).

This behaviour was filed as a bug only fairly recently (June 2019). A fix is incoming, but appears to be scheduled for Python 3.9 only, and will raise a ValueError for b'\407'.

It is unclear to me whether there will be bugfix releases for existing versions (including 3.8, which is in beta); possibly not, because this may change results for software that relies on this behaviour.

(For the more curious: here's the current submitted patch: https://github.com/python/cpython/pull/14654 .)

Confusing about python byte literal

Answers (1)

Related Questions