Reputation: 5001
I have defined a subclass NewDataStructure
that inherits from another class. Methods that act on the object itself work fine with this subclass. However, methods that create a copy, return an object of the parent class, not the subclass. This causes a lot of issues, when I'm calling that method within other methods.
Is there a way to specifically instruct that a named method of the parent class should return an object of the subclass?
Is there a way to instruct that all inherited methods should return an object of the subclass, not the parent class?
Perhaps I could pass the returned object to the __init__
function of my class? I'd need to modify my __init__
accordingly... What's the Pythonic way?
import pandas as pd
class NewDataStructure(pd.DataFrame):
def __init__(self, data, index, title):
super(NewDataStructure, self).__init__(data=data, index=index)
self.title = title
new_data_variable = NewDataStructure(data=None, index=None, title="")
changed = new_data_variable.unstack()
new_data_variable.reset_index(inplace=True)
unchanged = new_data_variable
print type(changed)
print type(unchanged)
<class 'pandas.core.series.Series'>
<class '__main__.NewDataStructure'>
Upvotes: 3
Views: 1597
Reputation: 8159
I'm afraid I think your question is a classic XY question, you are asking how to do Y which you think is a solution to X, whereas actually it's not a great solution to X and probably a better approach would be to try another solution to X.
X is roughly "how do I bind extra functionality on to DataFrame
?", and as @ppkt pointed out this is discussed in this question. The main problem with subclassing mentioned there is the one that you're hitting, that the class has factory methods that produce new instances of the class, but that is not something you can generally manipulate easily from the subclass.
However, the DataFrame
class provides a solution (which is official as of June 2019, see the documentation) via the _constructor
property:
class DataFrame(NDFrame):
...
@property
def _constructor(self):
return DataFrame
Which can be used to create instances rather than just DataFrame
. So you can solve your problem by overriding that property on your subclass:
class NewDataStructure(pd.DataFrame):
...
@property
def _constructor(self):
return NewDataStructure
This is a generally recognised pattern of deferring instance creation to a factory / constructor method that can be modified by users. Similar to the logging module's ability to set the logger class with logging.setLoggerClass()
.
Upvotes: 8
Reputation: 193
!!! I'm on small smart phone !!!
In the fashion that you are using the code you are implementing a recurrsive call to the function.
As far I can tell you you created the new_variable_data object right, but you do not have the functions designed properly that you are calling, unless they are part of pandas, and if that is the case you would have to create an for the pd each time and reassign.
As far as data as an argument passed with none, you would want to incorporate a an if statement to too, then assign data as an object to self.
I think you can do what you are trying to do you just need to redisgn the class a little and think about the object design.
I am on my way home, I'll edit this for you when I get on my laptop, and give you an example .
Upvotes: 0
Reputation: 26
I think the same problem was described here: Pandas DataFrame Object Inheritance or Object Use?
And as solution you should create a wrapper class for Pandas DataFrame.
Upvotes: 1