Odd behaviour of Python tokenizer when parsing integer numbers

Question

I noticed the following fact with CPython3 and Pypy3, contrasting with the behaviour of both CPython2 and Pypy2:

In Python3, it looks like leading zeros when parsing code yield an error except for a very single number which is 0. Thus 00 is valid but not 042.

In Python2, leading zeros are allowed for all integers. Thus 00 and 042 are valid.

Why did Python change its behaviour between both versions?

chepner · Accepted Answer

Python 3 standardized how all integer literals (other than base 10) were defined: 0?dddd..., where ? is a single letter indicating the base, and each d is interpreted as a digit in the appropriate base. 0... was retained as an exception, since 0 is 0 in any base, and was accepted as such before explicit base specifiers were required.

The biggest change related to this is that a number with a leading zero but no explicit base specifier is no longer assumed to be an octal number. Python 2 accepts both 042 and 0o42 as octal representations of decimal 34. (Early in Python's history, there were only three valid literals, with hexadecimal being the only one with a specifier. 0o... and 0b... were both later additions

Odd behaviour of Python tokenizer when parsing integer numbers

Answers (1)

Related Questions