Reputation: 1156
The following code was supposed to clarify how Python class variables behave,
but somehow it opens more questions than it solves.
The class Bodyguard
has the variable protect
, which is a list that by default contains the king.
The classes AnnoyingBodyguard
and Bureaucrat
change it.
Guards that protect specific people shall be called specific. (bg_prime
, bg_foreign
, ...)
The others shall be called generic. (bg1
, bg2
, bg3
)
For specific guards the changes affect only those initialized after the change.
For generic guards the changes affect all of them, no matter when they were initialized.
Why the before/after difference for specific guards? Why the specific/generic difference?
These differences are somewhat surprising, but I find the following even stranger.
Given two lists a
and b
, one might think that these operations will always have the same result:
reassign: a = a + b
add-assign: a += b
append: for x in b: a.append(x)
Why do they cause completely different results when used in Bodyguard.__init__
?
Only the results using reassign make any sense.
They can be seen below and in reassign_good.py.
The results for add-assign and append are quite useless, and I do not show them here.
But they can be seen in addassign_bad.py and append_bad.py.
class Bodyguard:
protect = ['the king']
def __init__(self, *args):
if args:
self.protect = self.protect + list(args)
##################################################################################
bg1 = Bodyguard()
bg_prime = Bodyguard('the prime minister')
bg_foobar = Bodyguard('the secretary of foo', 'the secretary of bar')
assert bg1.protect == ['the king']
assert bg_prime.protect == ['the king', 'the prime minister']
assert bg_foobar.protect == [
'the king', 'the secretary of foo', 'the secretary of bar'
]
##################################################################################
class AnnoyingBodyguard(Bodyguard):
Bodyguard.protect = ['his majesty the king']
bg2 = Bodyguard()
bg_foreign = Bodyguard('the foreign minister')
# The king's title was updated for all generic guards.
assert bg1.protect == bg2.protect == ['his majesty the king']
# And for specific guards initialized after AnnoyingBodyguard was defined.
assert bg_foreign.protect == ['his majesty the king', 'the foreign minister']
# But not for specific guards initialized before AnnoyingBodyguard was defined.
assert bg_prime.protect == ['the king', 'the prime minister']
assert bg_foobar.protect == [
'the king', 'the secretary of foo', 'the secretary of bar'
]
##################################################################################
class Bureaucrat:
def __init__(self, name):
Bodyguard.protect.append(name)
malfoy = Bureaucrat('Malfoy')
bg3 = Bodyguard()
bg_paper = Bodyguard('the secretary of paperwork')
# Malfoy was added for all generic guards.
assert bg1.protect == bg2.protect == bg3.protect == [
'his majesty the king', 'Malfoy'
]
# And for specific guards initialized after Malfoy:
assert bg_paper.protect == [
'his majesty the king', 'Malfoy', 'the secretary of paperwork'
]
# But not for specific guards initialized before Malfoy:
assert bg_prime.protect == ['the king', 'the prime minister']
assert bg_foreign.protect == [
'his majesty the king', 'the foreign minister'
]
Edit: Based on the comments and answers, I added the script reassign_better.py,
where the differences between generic and specific guards are removed.
The main class should look like this:
class Bodyguard:
protect = ['the king']
def __init__(self, *args):
self.protect = self.protect[:] # force reassign also for generic guards
if args:
self.protect = self.protect + list(args)
Upvotes: 0
Views: 119
Reputation: 77407
The difference is in how names are resolved on an object instance and how operators such as +
and +=
are implemented. In
self.protect = self.protect + list(args)
Python performs the operation on the right hand side and assigns the result to the left hand side. First, self.protect
is resolved. The instance self
doesn't have a variable "protect" and by Python's scoping rules, its defining class is checked next. The class level Bodyguard.protect
is found (its ['the king']
) and that value is used.
The +
operation on a list creates a new list, combining both sides. There are several rules for the +
operator, but most commonly it calls the __add__
method on the left operator and takes the return value as the result of the operation. All classes are free to decide what __add__
means to them. Lists think it should be a new list with the contents of both sides.
Now you have an anonymous list and the assignment self.protect = <that anonymous list>
. That's an assignment to the instance object. Interestingly, the next time you use self.protect
, this list is found and there is no reason to fall back to BodyGuard.protect
. That's the point of the code. It's a way to provide a default list.
Augmented addition is a bit different. Let's say you wrote
self.protect += list(args)
instead. Python resolves self.protect
the same way - its not on the instance object so you get the list in Bodyguard.protect
. Instead of __add__
, Python calls __iadd__
and once again the result is used for assignment. In this case, list
decided that __iadd__
should append to the list and return that original list. When Python assigns that list to self.protect
, its the updated list from Bodyguard.protect
, which now has two references and the extra values.
Note:
AnnoyingBodyguard
should define it own protect
instead of overwriting its parent class:
class AnnoyingBodyguard(Bodyguard):
protect = ['his majesty the king']
Now subclasses and instances of AnnoyingBodyguard
get the more annoying protect list, but Bodyguard
retains its original list. I mean, annoying is one thing, but changing protect
on your parent class is downright sociopathic.
Upvotes: 2
Reputation: 130
This behavior is due to mutable and immutable objects in Python. The class variables a and b are lists, and lists are mutable (so they all share the same memory location). When an instance is created from a class, it creates a link to the class variable.
The examples of add-assign and append will modify the list in place, thus the change will be seen throughout all other instances of the class. Reassign will create a new list (and a new memory location).
Upvotes: 1
Reputation: 54812
Perhaps examples will clarify this. This is, in my view, the KEY point to understanding Python behind the scenes.
a = [1,2,3]
b = a
c = a
At this point, our program has exactly ONE list object. There happen to be three names bound to that one list. Modifying any of them modifies the one list, and will be visible everywhere:
b.append(4)
print(c)
Prints [1, 2, 3, 4]
. However, if we do:
b = b + [5]
print(a)
print(b)
That creates a BRAND NEW list object and binds it to the name b
. a
and c
are still bound to the original, so that prints
[1, 2, 3, 4]
[1, 2, 3, 4, 5]
The way I like to think about this is that there are two different "spaces" in Python: there is an object space, filled with thousands of anonymous objects that do not have a name, and there is a namespace, which contains names that are bound to objects. It's important to recognize this. Names do not have values. They are merely bound to objects. And this includes EVERY name: variables, functions, classes, modules, etc.
Note that this confusion does not actually require separate names. Take, for example, the very common error:
a = [[0] * 10] * 10
Many would think this creates 10 different lists. That's not so. This creates exactly TWO lists: one that contains 10 zeros, and one that contains 10 references to that list. So if you do:
a[5][5] = 7
that change is seen in all ten elements of a
.
Upvotes: 1