Reputation: 2874
I am creating a data provider class that will hold data, perform transformations and make it available to other classes.
If the user creates and instance of this class and passes some data at instantiation, I would like to store it twice: once for all transformations and once as a copy of the original data. Let's assume the data itself has a copy
method.
I am using the attrs
package to create classes, but would also be interested in best approaches to this in general (perhaps there is a better way of getting what I am after?)
Here is what I have so far:
@attr.s
class DataContainer(object):
"""Interface for managing data. Reads and write data, acts as a provider to other classes.
"""
data = attr.ib(default=attr.Factory(list))
data_copy = data.copy()
def my_func(self, param1='all'):
"""Do something useful"""
return param1
This doesn't work: AttributeError: '_CountingAttr' object has no attribute 'copy'
I also cannot call data_copy = self.data.copy()
, I get the error: NameError: name 'self' is not defined
.
The working equivalent without the attrs
package would be:
class DataContainer(object):
"""Interface for managing data. Reads and write data, acts as a provider to other classes.
"""
def __init__(self, data):
"Init method, saving passed data and a backup copy"
self.data = data
self.data_copy = data
As pointed out by @hynek, my simple init method above needs to be corrected to make an actual copy of the data: i.e. self.data_copy = data.copy()
. Otherwise both self.data
and self.data_copy
would point to the same object.
Upvotes: 1
Views: 668
Reputation: 4146
You can do two things here.
The first one you've found yourself: you use __attr_post_init__
.
The second one is to have a default:
>>> import attr
>>> @attr.s
... class C:
... x = attr.ib()
... _x_backup = attr.ib()
... @_x_backup.default
... def _copy_x(self):
... return self.x.copy()
>>> l = [1, 2, 3]
>>> i = C(l)
>>> i
C(x=[1, 2, 3], _x_backup=[1, 2, 3])
>>> i.x.append(4)
>>> i
C(x=[1, 2, 3, 4], _x_backup=[1, 2, 3])
JFTR, you example of
def __init__(self, data):
self.data = data
self.data_copy = data
is wrong because you’d assign the same object twice which means that modifying self.data
also modifies self.data_copy
and vice versa.
Upvotes: 1
Reputation: 2874
After looking through the documentation a little more deeply (scroll right to the bottom), I found that there is a kind of post-init hook for classes that are created by attrs
.
You can just include a special __attrs_post_init__
method that can do the more complicated things one might want to do in an __init__
method, beyond simple assignment.
Here is my final working code:
In [1]: @attr.s
...: class DataContainer(object):
...: """Interface for managing data. Reads and write data,
...: acts as a provider to other classes.
...: """
...:
...: data = attr.ib()
...:
...: def __attrs_post_init__(self):
...: """Perform additional init work on instantiation.
...: Make a copy of the raw input data.
...: """
...: self.data_copy = self.data.copy()
In [2]: some_data = np.array([[1, 2, 3], [4, 5, 6]])
In [3]: foo = DataContainer(some_data)
In [4]: foo.data
Out[5]:
array([[1, 2, 3],
[4, 5, 6]])
In [6]: foo.data_copy
Out[7]:
array([[1, 2, 3],
[4, 5, 6]])
Just to be doubly sure, I checked to see that the two attributes are not referencing the same object. In this case they are not, which is likely thanks to the copy
method on the NumPy array.
In [8]: foo.data[0,0] = 999
In [9]: foo.data
Out[10]:
array([[999, 2, 3],
[ 4, 5, 6]])
In [11]: foo.data_copy
Out[12]:
array([[1, 2, 3],
[4, 5, 6]])
Upvotes: 0