Peng
Peng

Reputation: 1491

Integer object identity test: inconsistent behavior between large positive and small negative integers

I am using Anaconda (Python 3.6).

In the interactive mode, I did object identity test for positive integers >256:

# Interactive test 1
>>> x = 1000
>>> y = 1000
>>> x is y
False

Clearly, large integers (>256) writing in separate lines are not reused in interactive mode.

But if we write the assignment in one line, the large positive integer object is reused:

# Interactive test 2
>>> x, y = 1000, 1000
>>> x is y
True

That is, in interactive mode, writing the integer assignments in one or separate lines would make a difference for reusing the integer objects (>256). For integers in [-5,256] (as described https://docs.python.org/2/c-api/int.html), caching mechanism ensures that only one object is created, whether or not the assignment is in the same or different lines.

Now let's consider small negative integers less than -5 (any negative integer beyond the range [-5, 256] would serve the purpose), surprising results come out:

# Interactive test 3
>>> x, y = -6, -6
>>> x is y
False     # inconsistent with the large positive integer 1000

>>> -6 is -6
False

>>> id(-6), id(-6), id(-6)
(2280334806256, 2280334806128, 2280334806448)

>>> a = b =-6
>>> a is b
True    # different result from a, b = -6, -6

Clearly, this demonstrates inconsistency for object identity test between large positive integers (>256) and small negative integers (<-5). And for small negative integers (<-5), writing in the form a, b = -6, -6 and a = b =-6 also makes a difference (in contrast, it doesn't which form is used for large integers). Any explanations for these strange behaviors?

For comparison, let's move on to IDE run (I am using PyCharm with the same Python 3.6 interpreter), I run the following script

# IDE test case
x = 1000
y = 1000
print(x is y) 

It prints True, different from the interactive run. Thanks to @Ahsanul Haque, who already gave a nice explanation to the inconsistency between IDE run and interactive run. But it still remains to answer my question on the inconsistency between large positive integer and small negative integer in the interactive run.

Upvotes: 3

Views: 576

Answers (3)

vaultah
vaultah

Reputation: 46533

When you run 1000 is 1000 in the interactive shell or as part of the bigger script, CPython generates the bytecode like

In [3]: dis.dis('1000 is 1000')
   ...: 
  1           0 LOAD_CONST               0 (1000)
              2 LOAD_CONST               0 (1000)
              4 COMPARE_OP               8 (is)
              6 RETURN_VALUE

What it does is:

  • Loads two constants (LOAD_CONST pushes co_consts[consti] onto the stack -- docs)
  • Compares them using is (True if operands refer to the same object; False otherwise)
  • Returns the result

As CPython only creates one Python object for a constant used in a code block, 1000 is 1000 will result in a single integer constant being created:

In [4]: code = compile('1000 is 1000', '<string>', 'single') # code object

In [5]: code.co_consts # constants used by the code object
Out[5]: (1000, None)

According to the bytecode above, Python will load that same object twice and compare it with itself, so the expression will evaluate to True:

In [6]: eval(code)
Out[6]: True

The results are different for -6, because -6 is not immediately recognized as a constant:

In [7]: ast.dump(ast.parse('-6'))
Out[7]: 'Module(body=[Expr(value=UnaryOp(op=USub(), operand=Num(n=6)))])'

-6 is an expression negating the value of the integer literal 6.

Nevertheless, the bytecode for -6 is -6 is virtually the same as the first bytecode sample:

In [8]: dis.dis('-6 is -6')
  1           0 LOAD_CONST               1 (-6)
              2 LOAD_CONST               2 (-6)
              4 COMPARE_OP               8 (is)
              6 RETURN_VALUE

So Python loads two -6 constants and compares them using is.

How does the -6 expression become a constant? CPython has a peephole optimizer, capable of optimizing simple expressions involving constants by evaluating them right after the compilation, and storing the results in the table of constants.

As of CPython 3.6, folding unary operations is handled by fold_unaryops_on_constants in Python/peephole.c. In particular, - (unary minus) is evaluated by PyNumber_Negative that returns a new Python object (-6 is not cached). After that, the newly created object is inserted to the consts table. However, the optimizer does not check whether the result of the expression can be reused, so the results of identical expressions end up being distinct Python objects (again, as of CPython 3.6).

To illustrate this, I'll compile the -6 is -6 expression:

In [9]: code = compile('-6 is -6', '<string>', 'single')

There're two -6 constants in the co_consts tuple

In [10]: code.co_consts
Out[10]: (6, None, -6, -6)

and they have different memory addresses

In [11]: [id(const) for const in code.co_consts if const == -6]
Out[11]: [140415435258128, 140415435258576]

Of course, this means that -6 is -6 evaluates to False:

In [12]: eval(code)
Out[12]: False

For the most part the explanation above remains valid in presence of variables. When executed in the interactive shell, these three lines

>>> x = 1000
>>> y = 1000
>>> x is y
False

are parts of three different code blocks, so the 1000 constant won't be reused. However, if you put them all in one code block (like a function body) the constant will be reused.

In contrast, the x, y = 1000, 1000 line is always executed in one code block (even in the interactive shell), and therefore CPython always reuses the constant. In x, y = -6, -6, -6 isn't reused for the reasons explained in the first part of my answer.

x = y = -6 is trivial. Since there's exactly one Python object involved, x is y would return True even if you replaced -6 with something else.

Upvotes: 2

Ahasanul Haque
Ahasanul Haque

Reputation: 11134

Only one copy of a particular constant is created for a particular source code and reused if needed further. So, in pycharm, you are getting x is y == True.

But, in the interpreter, things are different. Here, only one line/statement runs at once. A particular constant is created for each new line. It is not reused in the next line. So, x is not y here.

But, if you can initialize in same line, you can have the same behavior (Reusing the same constant).

>>> x,y = 1000, 1000
>>> x is y
True
>>> x = 1000
>>> y = 1000
>>> x is y
False
>>> 

Edit:

A block is a piece of Python program text that is executed as a unit.

In an IDE, the whole module get executed at once i.e. the whole module is a block. But in interactive mode, each instruction is actually a block of code that is executed at once.

As I said earlier, a particular constant is created once for a block of code and reused if reappears in that block of code again.

This is main difference between IDE and interpreter.

Then, why actually interpreter gives same output as IDE for smaller numbers? This is when, integer caching comes into consideration.

If numbers are smaller, then they are cached and reused in next code block. So, we get the same id in the IDE.

But if they are bigger, they are not cached. Rather a new copy is created. So, as expected, the id is different.

Hope this makes sense now,

Upvotes: 5

Sidon
Sidon

Reputation: 1366

For complement the answer of the Ahsanul Haque, Try this in any IDE:

x = 1000
y = 1000
print (x is y)
print('\ninitial id x: ',id(x))
print('initial id y: ',id(y))

x=2000
print('\nid x after change value:   ',id(x))
print('id y after change x value: ', id(y))

initial id x:  139865953872336
initial id y:  139865953872336

id x after change value:    139865953872304
id y after change x value:  139865953872336

Very likely you will see the same ID for 'x' and 'y', then run the code in the interpreter and ids will be different.

>x=1000
>y=1000

>id(x)
=> 139865953870576
>id(y)
=> 139865953872368

See Here.

Upvotes: 0

Related Questions