n1k31t4
n1k31t4

Reputation: 2874

Storing passed data in object twice with `attrs` package

I am creating a data provider class that will hold data, perform transformations and make it available to other classes.

If the user creates and instance of this class and passes some data at instantiation, I would like to store it twice: once for all transformations and once as a copy of the original data. Let's assume the data itself has a copy method.

I am using the attrs package to create classes, but would also be interested in best approaches to this in general (perhaps there is a better way of getting what I am after?)

Here is what I have so far:

@attr.s
class DataContainer(object):
    """Interface for managing data. Reads and write data, acts as a provider to other classes.
    """

    data = attr.ib(default=attr.Factory(list))
    data_copy = data.copy()

    def my_func(self, param1='all'):
        """Do something useful"""
        return param1

This doesn't work: AttributeError: '_CountingAttr' object has no attribute 'copy'

I also cannot call data_copy = self.data.copy(), I get the error: NameError: name 'self' is not defined.

The working equivalent without the attrs package would be:

class DataContainer(object):
    """Interface for managing data. Reads and write data, acts as a provider to other classes.
    """
    def __init__(self, data):
        "Init method, saving passed data and a backup copy"
        self.data = data
        self.data_copy = data

EDIT:

As pointed out by @hynek, my simple init method above needs to be corrected to make an actual copy of the data: i.e. self.data_copy = data.copy(). Otherwise both self.data and self.data_copy would point to the same object.

Upvotes: 1

Views: 668

Answers (2)

hynek
hynek

Reputation: 4146

You can do two things here.

The first one you've found yourself: you use __attr_post_init__.

The second one is to have a default:

>>> import attr
>>> @attr.s
... class C:
...     x = attr.ib()
...     _x_backup = attr.ib()
...     @_x_backup.default
...     def _copy_x(self):
...         return self.x.copy()
>>> l = [1, 2, 3]
>>> i = C(l)
>>> i
C(x=[1, 2, 3], _x_backup=[1, 2, 3])
>>> i.x.append(4)
>>> i
C(x=[1, 2, 3, 4], _x_backup=[1, 2, 3])

JFTR, you example of

def __init__(self, data):
    self.data = data
    self.data_copy = data

is wrong because you’d assign the same object twice which means that modifying self.data also modifies self.data_copy and vice versa.

Upvotes: 1

n1k31t4
n1k31t4

Reputation: 2874

After looking through the documentation a little more deeply (scroll right to the bottom), I found that there is a kind of post-init hook for classes that are created by attrs.

You can just include a special __attrs_post_init__ method that can do the more complicated things one might want to do in an __init__ method, beyond simple assignment.

Here is my final working code:

In [1]: @attr.s
     ...: class DataContainer(object):
     ...:    """Interface for managing data. Reads and write data,
     ...:    acts as a provider to other classes.
     ...:    """
     ...: 
     ...:    data = attr.ib()
     ...: 
     ...:    def __attrs_post_init__(self):
     ...:        """Perform additional init work on instantiation.
     ...:        Make a copy of the raw input data.
     ...:        """
     ...:        self.data_copy = self.data.copy()



In [2]: some_data = np.array([[1, 2, 3], [4, 5, 6]])

In [3]: foo = DataContainer(some_data)

In [4]: foo.data
Out[5]: 
array([[1, 2, 3],
       [4, 5, 6]])

In [6]: foo.data_copy
Out[7]: 
array([[1, 2, 3],
       [4, 5, 6]])

Just to be doubly sure, I checked to see that the two attributes are not referencing the same object. In this case they are not, which is likely thanks to the copy method on the NumPy array.

In [8]: foo.data[0,0] = 999

In [9]: foo.data
Out[10]: 
array([[999,   2,   3],
       [  4,   5,   6]])

In [11]: foo.data_copy
Out[12]: 
array([[1, 2, 3],
       [4, 5, 6]])

Upvotes: 0

Related Questions