Reputation: 1182
I have a complex object I'd like to build around a pandas DataFrame. I've tried to do this with a subclass, but appending to the DataFrame reinitializes all properties in a new instance even when using _metadata
, as recommended here. I know subclassing pandas objects is not recommended but I don't know how to do what I want with composition (or any other method), so if someone can tell me how to do this without subclassing that would be great.
I'm working with the following code:
import pandas as pd
class thisDF(pd.DataFrame):
@property
def _constructor(self):
return thisDF
_metadata = ['new_property']
def __init__(self, data=None, index=None, columns=None, copy=False, new_property='reset'):
super(thisDF, self).__init__(data=data, index=index, columns=columns, dtype='str', copy=copy)
self.new_property = new_property
cols = ['A', 'B', 'C']
new_property = cols[:2]
tdf = thisDF(columns=cols, new_property=new_property)
As in the examples I linked to above, operations like tdf[['A', 'B']].new_property
work fine. However, modifying the data in a way that creates a new copy initializes a new instance that doesn't retain new_property
. So the code
print(tdf.new_property)
tdf = tdf.append(pd.Series(['a', 'b', 'c'], index=tdf.columns), ignore_index=True)
print(tdf.new_property)
outputs
['A', 'B']
reset
How do I extend pd.DataFrame
so that thisDF.append()
retains instance attributes (or some equivalent data structure if not using a subclass)? Note that I can do everything I want by making a class with a DataFrame as an attribute, but I don't want to do my_object.dataframe.some_method()
for all DataFrame operations.
Upvotes: 3
Views: 1865
Reputation: 1996
"[...] or wrapping all DataFrame methods with my_object class methods (because I'm assuming that would be a lot of work, correct?)"
No it doesn't have to be a lot of work. You actually don't have to wrap every function of the wrapped object yourself. You can use getattr to pass calls down to your wrapped object like this:
class WrappedDataFrame:
def __init__(self, df, new_property):
self._df = df
self.new_property = new_property
def __getattr__(self, attr):
if attr in self.__dict__:
return getattr(self, attr)
return getattr(self._df, attr)
def __getitem__(self, item):
return self._df[item]
def __setitem__(self, item, data):
self._df[item] = data
__getattr__
is a dunder method that is called every time you call a method of an instance of that class. In my implementation, every time __getattr__
is implicitly called, it checks if the object has the method you are calling. If it does, that method is returned and executed. Otherwise, it will look for that method in the __dict__
of the wrapped object and return that method.
So this class works almost exactly like a DataFrame for the most part. You could now just implement the methods you want to behave differently like append in your example.
You could either make it so that append modifies the wrapped DataFrame object
def append(self, *args, **kwargs):
self._df = self._df.append(*args, **kwargs)
or so that it returns a new instance of the WrappedDataFrame
class, which of course keeps all your functionality.
def append(self, *args, **kwargs):
return self.__class__(self._df.append(*args, **kwargs))
Upvotes: 5