bluprince13
bluprince13

Reputation: 5001

How can I get a subclass to return a copy of itself, rather than the parent class it is inheriting from?

I have defined a subclass NewDataStructure that inherits from another class. Methods that act on the object itself work fine with this subclass. However, methods that create a copy, return an object of the parent class, not the subclass. This causes a lot of issues, when I'm calling that method within other methods.

Is there a way to specifically instruct that a named method of the parent class should return an object of the subclass?

Is there a way to instruct that all inherited methods should return an object of the subclass, not the parent class?

Perhaps I could pass the returned object to the __init__ function of my class? I'd need to modify my __init__ accordingly... What's the Pythonic way?

import pandas as pd


class NewDataStructure(pd.DataFrame):
    def __init__(self, data, index, title):
        super(NewDataStructure, self).__init__(data=data, index=index)
        self.title = title


new_data_variable = NewDataStructure(data=None, index=None, title="")

changed = new_data_variable.unstack()

new_data_variable.reset_index(inplace=True)
unchanged = new_data_variable

print type(changed)
print type(unchanged)

<class 'pandas.core.series.Series'> 
<class '__main__.NewDataStructure'>

Upvotes: 3

Views: 1597

Answers (3)

daphtdazz
daphtdazz

Reputation: 8159

I'm afraid I think your question is a classic XY question, you are asking how to do Y which you think is a solution to X, whereas actually it's not a great solution to X and probably a better approach would be to try another solution to X.

X is roughly "how do I bind extra functionality on to DataFrame?", and as @ppkt pointed out this is discussed in this question. The main problem with subclassing mentioned there is the one that you're hitting, that the class has factory methods that produce new instances of the class, but that is not something you can generally manipulate easily from the subclass.

However, the DataFrame class provides a solution (which is official as of June 2019, see the documentation) via the _constructor property:

class DataFrame(NDFrame):
    ...
    @property
    def _constructor(self):
        return DataFrame

Which can be used to create instances rather than just DataFrame. So you can solve your problem by overriding that property on your subclass:

class NewDataStructure(pd.DataFrame):
    ...
    @property
    def _constructor(self):
        return NewDataStructure

This is a generally recognised pattern of deferring instance creation to a factory / constructor method that can be modified by users. Similar to the logging module's ability to set the logger class with logging.setLoggerClass().

Upvotes: 8

oOpSgEo
oOpSgEo

Reputation: 193

!!! I'm on small smart phone !!!

In the fashion that you are using the code you are implementing a recurrsive call to the function.

As far I can tell you you created the new_variable_data object right, but you do not have the functions designed properly that you are calling, unless they are part of pandas, and if that is the case you would have to create an for the pd each time and reassign.

As far as data as an argument passed with none, you would want to incorporate a an if statement to too, then assign data as an object to self.

I think you can do what you are trying to do you just need to redisgn the class a little and think about the object design.

I am on my way home, I'll edit this for you when I get on my laptop, and give you an example .

Upvotes: 0

ppkt
ppkt

Reputation: 26

I think the same problem was described here: Pandas DataFrame Object Inheritance or Object Use?

And as solution you should create a wrapper class for Pandas DataFrame.

Upvotes: 1

Related Questions