Brad Solomon
Brad Solomon

Reputation: 40878

Subclass a DataFrame without mutating original object

As mentioned a while back by @piRSquared, subclassing a pandas DataFrame in the way suggested in the docs, or by geopandas' GeoDataFrame, opens up the possibility of unwanted mutation of the original object:

class SubFrame(pd.DataFrame):

    def __init__(self, *args, **kwargs):
        attr = kwargs.pop('attr', None)
        super(SubFrame, self).__init__(*args, **kwargs)
        self.attr = attr

    @property
    def _constructor(self):
        return SubFrame

    def somefunc(self):
        """Add some extended functionality."""
        pass

df = pd.DataFrame([[1, 2], [3, 4]])
sf = SubFrame(df, attr=1)

sf[:] = np.nan  # Modifies `df`
print(df)

#     0   1
# 0 NaN NaN
# 1 NaN NaN

The error-prone "fix" is to pass a copy at instantiation:

sf = SubFrame(df.copy(), attr=1)

But this is easily subject to user error. My question is: can I create a copy of self (the passed DataFrame) within class SubFrame itself? How would I go about doing this?

If the answer is "No," I appreciate that too, so that I can scrap this endeavor before wasting time on it.


A polite request

The pandas docs suggest two alternatives:

  1. Extensible method chains with pipe
  2. Composition

I've already thoroughly considered both of these, so I'd appreciate if answers can refrain from an opinionated discussion with generic reasons about why these 2 alternatives are better/safer.

Upvotes: 2

Views: 87

Answers (1)

Alex Zisman
Alex Zisman

Reputation: 411

Self is not the dataframe you are passing. Regardless, you can perform the copy in the init function.

For example

import copy

def __init__(self, farg, **kwargs):
    farg = copy.deepcopy(farg)
    attr = kwargs.pop('attr', None)
    super().__init__(farg)
    self.attr = attr

Should show you that the farg is the df you are passing.

I don't know much about subclassing a DataFrame, so if you want to keep you original init structure you can copy all the *args. Can't speak to the safety of this approach.

def __init__(self, *args, **kwargs):
    cargs = tuple(copy.deepcopy(arg) for arg in args)
    attr = kwargs.pop('attr', None)
    super().__init__(*cargs, **kwargs)
    self.attr = attr

Upvotes: 1

Related Questions