Reputation: 40878
As mentioned a while back by @piRSquared, subclassing a pandas DataFrame in the way suggested in the docs, or by geopandas' GeoDataFrame, opens up the possibility of unwanted mutation of the original object:
class SubFrame(pd.DataFrame):
def __init__(self, *args, **kwargs):
attr = kwargs.pop('attr', None)
super(SubFrame, self).__init__(*args, **kwargs)
self.attr = attr
@property
def _constructor(self):
return SubFrame
def somefunc(self):
"""Add some extended functionality."""
pass
df = pd.DataFrame([[1, 2], [3, 4]])
sf = SubFrame(df, attr=1)
sf[:] = np.nan # Modifies `df`
print(df)
# 0 1
# 0 NaN NaN
# 1 NaN NaN
The error-prone "fix" is to pass a copy at instantiation:
sf = SubFrame(df.copy(), attr=1)
But this is easily subject to user error. My question is: can I create a copy of self
(the passed DataFrame) within class SubFrame
itself?
How would I go about doing this?
If the answer is "No," I appreciate that too, so that I can scrap this endeavor before wasting time on it.
The pandas docs suggest two alternatives:
pipe
I've already thoroughly considered both of these, so I'd appreciate if answers can refrain from an opinionated discussion with generic reasons about why these 2 alternatives are better/safer.
Upvotes: 2
Views: 87
Reputation: 411
Self
is not the dataframe you are passing. Regardless, you can perform the copy in the init function.
For example
import copy
def __init__(self, farg, **kwargs):
farg = copy.deepcopy(farg)
attr = kwargs.pop('attr', None)
super().__init__(farg)
self.attr = attr
Should show you that the farg is the df you are passing.
I don't know much about subclassing a DataFrame, so if you want to keep you original init structure you can copy all the *args. Can't speak to the safety of this approach.
def __init__(self, *args, **kwargs):
cargs = tuple(copy.deepcopy(arg) for arg in args)
attr = kwargs.pop('attr', None)
super().__init__(*cargs, **kwargs)
self.attr = attr
Upvotes: 1