jlandercy
jlandercy

Reputation: 11097

Deeper understanding of Python object mechanisms

I would like to better understand Python 3.x data model. But I do not find complete and precise explanation of Python Object behaviours.

I am looking for references, it would be great if every case that I show below could be linked to a Python API reference or PEP or anything else valuable. Thank you further for your wise advises...

Let say we have some complex Python structure for testing purposes:

d1 = {
    'id': 5432
   ,'name': 'jlandercy'
   ,'pets': {
        'andy': {
            'type': 'cat'
           ,'age': 3.5
        }
       ,'ray': {
            'type': 'dog'
           ,'age': 6.5
        }
    }
   ,'type': str
   ,'complex': (5432, 6.5, 'cat', str)
   ,'list': ['milk', 'chocolate', 'butter']
}

1) Immutable atomic objects are singletons

Whatever the way I create a new integer:

n1 = 5432
n2 = int(5432)
n3 = copy.copy(n1)
n4 = copy.deepcopy(n1)

No new copy of this number is created, instead it points towards the same object as d1['id']. More concisely

 d1['id'] is n1
 ...

They all do have the same id, I cannot create a new instance of int with value 5432, therefore it is a singleton.

2) Immutable and Iterable objects might be singletons...

Previous observation also works for str, which are immutable and iterable. All following variables:

s1 = 'jlandercy'
s2 = str('jlandercy')
s3 = copy.copy(s1)
s4 = copy.deepcopy(s1)

Point towards the copy initially created d1['name']. String are also singletons.

...but not exactly...

Tuple are also immutable and iterable, but they do not behave like string. It is know that the magic empty tuple is a singleton:

() is ()

But other tuples are not.

t1 = (5432, 6.5, 'cat', str)

...instead they hash equally

They do not have the same id:

id(d1['complex']) != id(t1)

But all items within those two structures are atomic, so they point towards same instances. The important point is, both structures hash the same way:

hash(d1['complex']) == hash(t1)

So they can be used as dictionary keys. This is even true for nested tuples:

t2 = (1, (2, 3))
t3 = (1, (2, 3))

They do have the same hash.

3) Passing dictionary by double dereferencing works as shallow copy of it

Lets define the following function:

def f1(**kwargs):
    kwargs['id'] = 1111
    kwargs['pets']['andy'] = None

Which will receive our trial dictionary by double dereferencing (** operator) first degree members will be copied, but deepest will be passed by reference.

Output of this simple program, illustrates it:

print(d1)
f1(**d1)
print(d1)

It returns:

{'complex': (5432, 6.5, 'cat', <class 'str'>),
 'id': 5432,
 'list': ['milk', 'chocolate', 'butter'],
 'name': 'jlandercy',
 'pets': {'andy': {'age': 3.5, 'type': 'cat'},
          'ray': {'age': 6.5, 'type': 'dog'}},
 'type': <class 'str'>}

{'complex': (5432, 6.5, 'cat', <class 'str'>),
 'id': 5432,
 'list': ['milk', 'chocolate', 'butter'],
 'name': 'jlandercy',
 'pets': {'andy': None, 'ray': {'age': 6.5, 'type': 'dog'}},
 'type': <class 'str'>}

The dictionary d1 has been modified by function f1, but not completely. Member id'is kept back because we worked on a copy, but member pets is also a dictionary and the shallow copy did not copy it, then it has been modified.

This behaviour is similar to copy.copy behaviour for dict object. Where we need copy.deepcopy to have a recursive and complete copy of object.

My requests are:

Upvotes: 2

Views: 1725

Answers (1)

Dimitris Fasarakis Hilliard
Dimitris Fasarakis Hilliard

Reputation: 160677

Immutable atomic objects are singletons

Nope, some are and some aren't, this is a detail of the CPython implementation.

  • Integers in the range (-6, 256] are cached and when a new request for these is made the already existing objects are returned. Numbers outside that range are subject to constant folding where the interpreter re-uses constants during compilation as a slight optimization. This is documented in the section on creating new PyLong objects.

    Also, see the following for a discussion on these:

  • Strings literals are subject to interning during the compilation to bytecode as do ints. The rules for governing this are not as simplistic as for ints, though: Strings under a certain size composed of certain characters are only considered. I am not aware of any section in the docs specifying this, you could take a look at the behavior by reading here.

  • Floats, for example, which could be considered "atomic" (even though in Python that term doesn't have the meaning you think) there are no singletons:

    i = 1.0
    j = 1.0
    i is j # False
    

    they are still of course subject to constant folding. As you can see by reading: 'is' operator behaves unexpectedly with floats

Immutable and Iterable objects might be singletons but not exactly instead they hash equally

Empty immutables collections are signletons; this is again an implementation detail that can't be found in the Python Reference but truly only discovered if you look at the source.

See here for a look at the implementation: Why does '() is ()' return True when '[] is []' and '{} is {}' return False?

Passing dictionary by double dereferencing works as shallow copy of it.

Yes. Though the term isn't double dereferencing, it is unpacking.

Are those behaviours well documented somewhere?

Those that are considered an implementation detail needn't be documented in the way you'd find documentation for the max function for example. These are specific things that might easily change if the decision is made so.

Upvotes: 4

Related Questions