Aenaon
Aenaon

Reputation: 3563

Python: pass by reference

I am trying to understand the behaviour demonstrated below please. Suppose we have the simple class

class A():
    def __init__(self):
        self._df = pd.DataFrame([1,2,3], columns=['c1'], index=['r1', 'r2', 'r3'])

    @property
    def df(self):
        return self._df

    def summary(self):
        df2 = self.df
        df2['c2'] = pd.Series([4,5,6], index=df2.index)
        return df2

and we make an instance a:

a = A()

Then a.df returns

    c1
r1  1
r2  2
r3  3

Calling the summary method

s = a.summary()
s

returns

    c1  c2
r1  1   4
r2  2   5
r3  3   6

which is fine so far. However the df property (and the underlying attribute) of object a have now been changed too.

a.df

    c1  c2
r1  1   4
r2  2   5
r3  3   6

I understand in broad lines that it has to do with the much-discussed pass by reference design of Python and I suspect the key here is the line df2=self.df in the summary method. Hence df2 is actually a reference pointing to the same memory location as self.df. Therefore when we modify the object labeled as df2 (by adding one more column in this example) we will be able to see the modified object by calling either df2 or self.df because they both point to the same thing.

Is this a fair understanding please? What are the best practices If we wanted to decouple the two? Edit df2 without affecting df. Make an explicit copy by df2 = self.df.copy() or something else?

Upvotes: 0

Views: 165

Answers (1)

Caleb Hattingh
Caleb Hattingh

Reputation: 9225

You are correct that df2 = self.df.copy() would resolve your problem, but this has nothing to do with pass by value or pass by reference which refer to arguments in function calls, and neither of which accurately describes Python's function-call semantics anyway.

For your situation, as well as for parameters in function calls, the easiest mental picture is Brett Cannon's idea of sticky labels. All the names, e.g. so-called variable names, are sticky labels that refer to an actual object. So you can create as many sticky labels as you like:

def f():
    pass

g = h = k = m = f  # You created 4 new sticky labels that refer to a single function

class A():
    pass

g = h = k = m = A  # Reusing same 4 labels, all refer to one class

a = ()
g = h = k = m = a  # Reusing same 4 labels, all refer to one instance

Reassigning a label (name) to a different object, simply means that the previous object loses a reference. When an object has no more reference, it gets deleted. In the example above, I created a bunch of references to an instance of the class A. Using the del keyword you can remove names (i.e. sticky labels), but the underlying object only gets deleted when there are no more references to it.

del h
del k
del m
del a

# At this point, there is still one active reference to the instance
# that we first created when we assigned it to name "a".

del g

# Now we have removed all references to that instance, so it gets deleted.

Upvotes: 2

Related Questions