Chris J Harris
Chris J Harris

Reputation: 1841

Subclassing pd.DataFrame class results in `object has no attribute '_data'` when trying to display the data

I'm trying to create a (very simple) pandas subclass, likeso:

import pandas as pd

data = pd.DataFrame({'A': [1, 2], 'B': [2, 3], 'C': [4, 5]})

class TestFrame(pd.DataFrame):
    # See https://pandas.pydata.org/pandas-docs/stable/development/extending.html#extending-extension-types
    _metadata = pd.DataFrame._metadata + ["addnl"]

    @property
    def _constructor(self):
        return TestFrame

    @property
    def _constructor_sliced(self):
        return pd.Series

    @classmethod
    def plus_one(
        cls,
        df,
    ):
        tf = super().__new__(cls, df)
        tf.addnl = 1
        return tf

t1 = TestFrame.plus_one(data)

This proceeds without error, except that trying to view t1 gives me AttributeError: 'TestFrame' object has no attribute '_data'.

I think this is because I am calling DataFrame.__new__ instead of __init__, because it gives the same error for this:

object.__new__(pd.DataFrame, {'A': [1, 2], 'B': [2, 3], 'C': [4, 5]})

However, I can't then find a way to define the constructor. This is made more problematic by the fact that the pandas subclassing infrastructure doesn't yet (as far as I can tell) let you define an __init__ with new attributes.

Any help much appreciated.

Upvotes: 0

Views: 462

Answers (1)

cs95
cs95

Reputation: 402323

The issue here is that the line tf = super().__new__(cls, df) does not make sense. You are not overriding DataFrame.__init__ or __new__ so you don't have to use super() to call them.

If the idea is to instantiate a frame of type TestFrame, you can use tf = cls(df).

@classmethod
def plus_one(cls, df):
    tf = cls(df)
    tf.addnl = 1

    return tf

Upvotes: 2

Related Questions