Reputation: 3563
I am trying to understand the behaviour demonstrated below please. Suppose we have the simple class
class A():
def __init__(self):
self._df = pd.DataFrame([1,2,3], columns=['c1'], index=['r1', 'r2', 'r3'])
@property
def df(self):
return self._df
def summary(self):
df2 = self.df
df2['c2'] = pd.Series([4,5,6], index=df2.index)
return df2
and we make an instance a
:
a = A()
Then a.df
returns
c1
r1 1
r2 2
r3 3
Calling the summary
method
s = a.summary()
s
returns
c1 c2
r1 1 4
r2 2 5
r3 3 6
which is fine so far. However the df
property (and the underlying attribute) of object a
have now been changed too.
a.df
c1 c2
r1 1 4
r2 2 5
r3 3 6
I understand in broad lines that it has to do with the much-discussed pass by reference design of Python and I suspect the key here is the line df2=self.df
in the summary
method. Hence df2
is actually a reference pointing to the same memory location as self.df
. Therefore when we modify the object labeled as df2
(by adding one more column in this example) we will be able to see the modified object by calling either df2
or self.df
because they both point to the same thing.
Is this a fair understanding please? What are the best practices If we wanted to decouple the two? Edit df2
without affecting df
. Make an explicit copy by df2 = self.df.copy()
or something else?
Upvotes: 0
Views: 165
Reputation: 9225
You are correct that df2 = self.df.copy()
would resolve your problem, but this has nothing to do with pass by value or pass by reference which refer to arguments in function calls, and neither of which accurately describes Python's function-call semantics anyway.
For your situation, as well as for parameters in function calls, the easiest mental picture is Brett Cannon's idea of sticky labels. All the names, e.g. so-called variable names, are sticky labels that refer to an actual object. So you can create as many sticky labels as you like:
def f():
pass
g = h = k = m = f # You created 4 new sticky labels that refer to a single function
class A():
pass
g = h = k = m = A # Reusing same 4 labels, all refer to one class
a = ()
g = h = k = m = a # Reusing same 4 labels, all refer to one instance
Reassigning a label (name) to a different object, simply means that the previous object loses a reference. When an object has no more reference, it gets deleted. In the example above, I created a bunch of references to an instance of the class A
. Using the del
keyword you can remove names (i.e. sticky labels), but the underlying object only gets deleted when there are no more references to it.
del h
del k
del m
del a
# At this point, there is still one active reference to the instance
# that we first created when we assigned it to name "a".
del g
# Now we have removed all references to that instance, so it gets deleted.
Upvotes: 2