Reputation: 11097
I would like to better understand Python 3.x data model. But I do not find complete and precise explanation of Python Object behaviours.
I am looking for references, it would be great if every case that I show below could be linked to a Python API reference or PEP or anything else valuable. Thank you further for your wise advises...
Let say we have some complex Python structure for testing purposes:
d1 = {
'id': 5432
,'name': 'jlandercy'
,'pets': {
'andy': {
'type': 'cat'
,'age': 3.5
}
,'ray': {
'type': 'dog'
,'age': 6.5
}
}
,'type': str
,'complex': (5432, 6.5, 'cat', str)
,'list': ['milk', 'chocolate', 'butter']
}
1) Immutable atomic objects are singletons
Whatever the way I create a new integer:
n1 = 5432
n2 = int(5432)
n3 = copy.copy(n1)
n4 = copy.deepcopy(n1)
No new copy of this number is created, instead it points towards the same object as d1['id']
. More concisely
d1['id'] is n1
...
They all do have the same id
, I cannot create a new instance of int
with value 5432, therefore it is a singleton.
2) Immutable and Iterable objects might be singletons...
Previous observation also works for str
, which are immutable and iterable. All following variables:
s1 = 'jlandercy'
s2 = str('jlandercy')
s3 = copy.copy(s1)
s4 = copy.deepcopy(s1)
Point towards the copy initially created d1['name']
. String are also singletons.
...but not exactly...
Tuple are also immutable and iterable, but they do not behave like string. It is know that the magic empty tuple is a singleton:
() is ()
But other tuples are not.
t1 = (5432, 6.5, 'cat', str)
...instead they hash equally
They do not have the same id
:
id(d1['complex']) != id(t1)
But all items within those two structures are atomic, so they point towards same instances. The important point is, both structures hash
the same way:
hash(d1['complex']) == hash(t1)
So they can be used as dictionary keys. This is even true for nested tuples:
t2 = (1, (2, 3))
t3 = (1, (2, 3))
They do have the same hash
.
3) Passing dictionary by double dereferencing works as shallow copy of it
Lets define the following function:
def f1(**kwargs):
kwargs['id'] = 1111
kwargs['pets']['andy'] = None
Which will receive our trial dictionary by double dereferencing (**
operator) first degree members will be copied, but deepest will be passed by reference.
Output of this simple program, illustrates it:
print(d1)
f1(**d1)
print(d1)
It returns:
{'complex': (5432, 6.5, 'cat', <class 'str'>),
'id': 5432,
'list': ['milk', 'chocolate', 'butter'],
'name': 'jlandercy',
'pets': {'andy': {'age': 3.5, 'type': 'cat'},
'ray': {'age': 6.5, 'type': 'dog'}},
'type': <class 'str'>}
{'complex': (5432, 6.5, 'cat', <class 'str'>),
'id': 5432,
'list': ['milk', 'chocolate', 'butter'],
'name': 'jlandercy',
'pets': {'andy': None, 'ray': {'age': 6.5, 'type': 'dog'}},
'type': <class 'str'>}
The dictionary d1
has been modified by function f1
, but not completely. Member id
'is kept back because we worked on a copy, but member pets
is also a dictionary and the shallow copy did not copy it, then it has been modified.
This behaviour is similar to copy.copy
behaviour for dict
object. Where we need copy.deepcopy
to have a recursive and complete copy of object.
My requests are:
Are my observations correctly interpreted?
Immutable atomic objects are singletons
Immutable and Iterable objects might be singletons but not exactly instead they hash equally
Passing dictionary by double dereferencing works as shallow copy of it
Upvotes: 2
Views: 1725
Reputation: 160677
Immutable atomic objects are singletons
Nope, some are and some aren't, this is a detail of the CPython implementation.
Integers in the range (-6, 256]
are cached and when a new request for these is made the already existing objects are returned. Numbers outside that range are subject to constant folding where the interpreter re-uses constants during compilation as a slight optimization. This is documented in the section on creating new PyLong
objects.
Also, see the following for a discussion on these:
Strings literals are subject to interning during the compilation to bytecode as do ints. The rules for governing this are not as simplistic as for ints, though: Strings under a certain size composed of certain characters are only considered. I am not aware of any section in the docs specifying this, you could take a look at the behavior by reading here.
Floats, for example, which could be considered "atomic" (even though in Python that term doesn't have the meaning you think) there are no singletons:
i = 1.0
j = 1.0
i is j # False
they are still of course subject to constant folding. As you can see by reading: 'is' operator behaves unexpectedly with floats
Immutable and Iterable objects might be singletons but not exactly instead they hash equally
Empty immutables collections are signletons; this is again an implementation detail that can't be found in the Python Reference but truly only discovered if you look at the source.
See here for a look at the implementation: Why does '() is ()' return True when '[] is []' and '{} is {}' return False?
Passing dictionary by double dereferencing works as shallow copy of it.
Yes. Though the term isn't double dereferencing, it is unpacking.
Are those behaviours well documented somewhere?
Those that are considered an implementation detail needn't be documented in the way you'd find documentation for the max
function for example. These are specific things that might easily change if the decision is made so.
Upvotes: 4