Reputation: 1579

Whats the correct way to implement a descriptor?

Considering this code: Run on python 3.6

Bar assigns the value to the descriptor instance

Bat assigns the value to the containing class instance.

Code examples I've seen (and used to my endless frustration) use the Bar example. Such as this site

and from the python docs

As seen from the output using the Bar example two instances of a class can't use the same descriptor.

Or am I missing something?

class DescriptorA(object):
    value = None
    def __get__(self, instance, owner):
        return self.value

    def __set__(self, instance, value):
        self.value = value

class DescriptorB(object):
    _value = None
    def __get__(self, instance, owner):
        return instance._value

    def __set__(self, instance, value):
        instance._value = value


class Bar(object):
    foo = DescriptorA()
    def __init__(self, foo):
        self.foo = foo

class Bat(object):
    foo = DescriptorB()
    def __init__(self, foo):
        self.foo = foo


print('BAR')
a = Bar(1)
print('a', a.foo)

b = Bar(2)
print('b', b.foo)
print('Checking a')
print('a', a.foo)

print('BAT')
c = Bat(3)
print('c', c.foo)

d = Bat(4)
print('d', d.foo)
print('Checking c')
print('c', c.foo)

outputs

BAR
a 1
b 2
Checking a
a 2
BAT
c 3
d 4
Checking c
c 3

UPDATE

Just wanted to add this. In response to the good answers. When not using descriptors, but still using class attributes. We get different behavior. This is why I made the mistake of using DescriptorA Eg.

class Bar(object):
    foo = None
    def __init__(self, foo):
        self.foo = foo

class Bat(object):
    foo = None
    def __init__(self, foo):
        self.foo = foo


print('BAR')
a = Bar(1)
print('a', a.foo)

b = Bar(2)
print('b', b.foo)
print('Checking a')
print('a', a.foo)

print('BAT')
c = Bat(3)
print('c', c.foo)

d = Bat(4)
print('d', d.foo)
print('Checking c')
print('c', c.foo)

BAR
a 1
b 2
Checking a
a 1
BAT
c 3
d 4
Checking c
c 3

Upvotes: 2

Answers (3)

MSeifert

Reputation: 152637

It really depends on your use-case how you store the variables.

We have 4 objects, each having their own set of variables:

The descriptor class
Instances of the descriptor
The normal class
Instances of the normal class

Normally descriptors instances are stored in the "normal class" because the descriptor protocol is not invoked when you store the descriptor in the instance. You could also "go meta" and use descriptors on metaclasses or descriptors inside descriptors but let's ignore these for the sake keeping it short and staying sane (it's not really difficult, but probably a bit too broad).

So in case of your DescriptorA you store:

value = None in the descriptor class
value = ? in the descriptor instance (at least after calling __set__ at least once
foo = descriptor instance in the normal class
nothing in the class instance

In the case of DescriptorB you store:

_value = None in the descriptor class
nothing in the descriptor instance
foo = descriptor instance in the normal class
_value = ? in the instance of your normal class

See the difference? In the first case different instances of your normal class access the same descriptor instance, so everything is shared. In the second case you store everything on the instance of your class and nothing in the descriptor instance so nothing is shared.

Note that your DescriptorB seems strange, why store _value = None in the descriptor class, when you never use it? Remember that you access the _value of your instance of the normal class not the _value of your descriptor instance in __get__!

As I said before it depends on your use-case which approach to choose. Normally you want to have some shared attributes and some per-instance attributes. But you could also share attributes between all instances of your descriptor and given that you can also access the type of your normal class instance in __get__ and use type(instance) in __set__ you could also modify class attributes of your normal class.

For example the example in the Python docs:

class RevealAccess(object):
    """A data descriptor that sets and returns values
       normally and prints a message logging their access.
    """

    def __init__(self, initval=None, name='var'):
        self.val = initval
        self.name = name

    def __get__(self, obj, objtype):
        print('Retrieving', self.name)
        return self.val

    def __set__(self, obj, val):
        print('Updating', self.name)
        self.val = val

>>> class MyClass(object):
...     x = RevealAccess(10, 'var "x"')
...     y = 5
...

They created a descriptor for a class variable - deliberately. In that case there is just no "instance" and it doesn't matter if it's shared because a class variable would be shared by instances by default. That means that even if you set the variable on an instance it would change for all other instances.

So you shouldn't really use descriptor instance variables if you don't want it to be shared. However you should use them for everything that should be shared (e.g. the name of the attribute, etc.).

Probably also interesting might be the "way" Python looks up attributes. I generally find this image from this blog very informative:

Upvotes: 1

bruno desthuilliers

Reputation: 77892

Descriptors are class attributes (and they have to be for the descriptor protocol to work). Being class attributes mean there's one single descriptor instance shared by all instances of the class (and it's subclasses), so what you observe with class Bar and DescriptorA is the expected behaviour.

It doesn't mean that "two instances of a class can't use the same descriptor (instance)" - they actually do and that's why you have this behaviour - but that you cannot store per-instance values on your descriptor instance, at least not that simply.

One possible solution would be to maintain an id(instance):instance_value mapping in your descriptor, ie:

class DescriptorA(object):
    def __init__(self, default=None):
        self._values = {}
        self._default = default
    def __get__(self, instance, cls):
        if instance is None:
            return self
        return self._values.get(id(instance), self._default)
    def __set__(self, instance, value):
        self._values[id(instance)] = value

but this has quite few drawbacks, the first obvious one being that your _values dict will not be cleared when an instance is garbage-collected. It might end up eating quite some ram on a long running process...

Edit: the code in your update is not using class attributes. Having an eponym class attribute is irrelevant here - the initializer sets a value instance attribute that shadows the class-level one (which is actually never used in your code snippet).

If you want a meaningfull example with class attributes, use a mutable object and mutate it instead of creating an instance attribute:

>>> class Foo(object):
...    bar = []
...    def __init__(self, baaz):
...        self.baaz = baaz
...        self.bar.append(baaz)
... 
>>> f1 = Foo("foo1")
>>> f1.baaz
'foo1'
>>> f1.bar
['foo1']
>>> f2 = Foo("foo2")
>>> f1.baaz
'foo1'
>>> f2.baaz
'foo2'
>>> f1.bar
['foo1', 'foo2']
>>> f2.bar
['foo1', 'foo2']
>>>

Upvotes: 1

hspandher

Reputation: 16733

A descriptor is defined at class level, and there's only one instance of that descriptor in the class. So, in the first descriptor i.e. DescriptorA, you're storing the value as a variable on the descriptor and not on the instance object. Obviously, that value would be overridden when you instantiate another instance.

Any value you store at the descriptor would remain same for all the instances of the class to which descriptor is assigned. That's why DescriptorB works and is the correct way to use the descriptors and not the first one, unless your use case required variables that are supposed to remain the same across instances.

Upvotes: 1

Whats the correct way to implement a descriptor?

Answers (3)

Related Questions