Reputation: 1491
I am using Anaconda (Python 3.6).
In the interactive mode, I did object identity test for positive integers >256:
# Interactive test 1
>>> x = 1000
>>> y = 1000
>>> x is y
False
Clearly, large integers (>256) writing in separate lines are not reused in interactive mode.
But if we write the assignment in one line, the large positive integer object is reused:
# Interactive test 2
>>> x, y = 1000, 1000
>>> x is y
True
That is, in interactive mode, writing the integer assignments in one or separate lines would make a difference for reusing the integer objects (>256). For integers in [-5,256] (as described https://docs.python.org/2/c-api/int.html), caching mechanism ensures that only one object is created, whether or not the assignment is in the same or different lines.
Now let's consider small negative integers less than -5 (any negative integer beyond the range [-5, 256] would serve the purpose), surprising results come out:
# Interactive test 3
>>> x, y = -6, -6
>>> x is y
False # inconsistent with the large positive integer 1000
>>> -6 is -6
False
>>> id(-6), id(-6), id(-6)
(2280334806256, 2280334806128, 2280334806448)
>>> a = b =-6
>>> a is b
True # different result from a, b = -6, -6
Clearly, this demonstrates inconsistency for object identity test between large positive integers (>256) and small negative integers (<-5). And for small negative integers (<-5), writing in the form a, b = -6, -6 and a = b =-6 also makes a difference (in contrast, it doesn't which form is used for large integers). Any explanations for these strange behaviors?
For comparison, let's move on to IDE run (I am using PyCharm with the same Python 3.6 interpreter), I run the following script
# IDE test case
x = 1000
y = 1000
print(x is y)
It prints True, different from the interactive run. Thanks to @Ahsanul Haque, who already gave a nice explanation to the inconsistency between IDE run and interactive run. But it still remains to answer my question on the inconsistency between large positive integer and small negative integer in the interactive run.
Upvotes: 3
Views: 576
Reputation: 46533
When you run 1000 is 1000
in the interactive shell or as part of the bigger script, CPython generates the bytecode like
In [3]: dis.dis('1000 is 1000')
...:
1 0 LOAD_CONST 0 (1000)
2 LOAD_CONST 0 (1000)
4 COMPARE_OP 8 (is)
6 RETURN_VALUE
What it does is:
is
(True
if operands refer to the same object; False
otherwise)As CPython only creates one Python object for a constant used in a code block, 1000 is 1000
will result in a single integer constant being created:
In [4]: code = compile('1000 is 1000', '<string>', 'single') # code object
In [5]: code.co_consts # constants used by the code object
Out[5]: (1000, None)
According to the bytecode above, Python will load that same object twice and compare it with itself, so the expression will evaluate to True
:
In [6]: eval(code)
Out[6]: True
The results are different for -6
, because -6
is not immediately recognized as a constant:
In [7]: ast.dump(ast.parse('-6'))
Out[7]: 'Module(body=[Expr(value=UnaryOp(op=USub(), operand=Num(n=6)))])'
-6
is an expression negating the value of the integer literal 6
.
Nevertheless, the bytecode for -6 is -6
is virtually the same as the first bytecode sample:
In [8]: dis.dis('-6 is -6')
1 0 LOAD_CONST 1 (-6)
2 LOAD_CONST 2 (-6)
4 COMPARE_OP 8 (is)
6 RETURN_VALUE
So Python loads two -6
constants and compares them using is
.
How does the -6
expression become a constant? CPython has a peephole optimizer, capable of optimizing simple expressions involving constants by evaluating them right after the compilation, and storing the results in the table of constants.
As of CPython 3.6, folding unary operations is handled by fold_unaryops_on_constants
in Python/peephole.c
. In particular, -
(unary minus) is evaluated by PyNumber_Negative
that returns a new Python object (-6
is not cached). After that, the newly created object is inserted to the consts
table. However, the optimizer does not check whether the result of the expression can be reused, so the results of identical expressions end up being distinct Python objects (again, as of CPython 3.6).
To illustrate this, I'll compile the -6 is -6
expression:
In [9]: code = compile('-6 is -6', '<string>', 'single')
There're two -6
constants in the co_consts
tuple
In [10]: code.co_consts
Out[10]: (6, None, -6, -6)
and they have different memory addresses
In [11]: [id(const) for const in code.co_consts if const == -6]
Out[11]: [140415435258128, 140415435258576]
Of course, this means that -6 is -6
evaluates to False
:
In [12]: eval(code)
Out[12]: False
For the most part the explanation above remains valid in presence of variables. When executed in the interactive shell, these three lines
>>> x = 1000
>>> y = 1000
>>> x is y
False
are parts of three different code blocks, so the 1000
constant won't be reused. However, if you put them all in one code block (like a function body) the constant will be reused.
In contrast, the x, y = 1000, 1000
line is always executed in one code block (even in the interactive shell), and therefore CPython always reuses the constant. In x, y = -6, -6
, -6
isn't reused for the reasons explained in the first part of my answer.
x = y = -6
is trivial. Since there's exactly one Python object involved, x is y
would return True
even if you replaced -6
with something else.
Upvotes: 2
Reputation: 11134
Only one copy of a particular constant is created for a particular source code and reused if needed further. So, in pycharm, you are getting x is y
== True
.
But, in the interpreter, things are different. Here, only one line/statement runs at once. A particular constant is created for each new line. It is not reused in the next line. So, x is not y
here.
But, if you can initialize in same line, you can have the same behavior (Reusing the same constant).
>>> x,y = 1000, 1000
>>> x is y
True
>>> x = 1000
>>> y = 1000
>>> x is y
False
>>>
Edit:
A block is a piece of Python program text that is executed as a unit.
In an IDE, the whole module get executed at once i.e. the whole module is a block. But in interactive mode, each instruction is actually a block of code that is executed at once.
As I said earlier, a particular constant is created once for a block of code and reused if reappears in that block of code again.
This is main difference between IDE and interpreter.
Then, why actually interpreter gives same output as IDE for smaller numbers? This is when, integer caching comes into consideration.
If numbers are smaller, then they are cached and reused in next code block. So, we get the same id in the IDE.
But if they are bigger, they are not cached. Rather a new copy is created. So, as expected, the id is different.
Hope this makes sense now,
Upvotes: 5
Reputation: 1366
For complement the answer of the Ahsanul Haque, Try this in any IDE:
x = 1000
y = 1000
print (x is y)
print('\ninitial id x: ',id(x))
print('initial id y: ',id(y))
x=2000
print('\nid x after change value: ',id(x))
print('id y after change x value: ', id(y))
initial id x: 139865953872336
initial id y: 139865953872336
id x after change value: 139865953872304
id y after change x value: 139865953872336
Very likely you will see the same ID for 'x' and 'y', then run the code in the interpreter and ids will be different.
>x=1000
>y=1000
>id(x)
=> 139865953870576
>id(y)
=> 139865953872368
Upvotes: 0