Reputation: 509
First question:
I am working with pandas' DataFrames and I am frequently running the same routines as part of data pre-processing and other things. I'd like to write some of these routines as methods in a class called ExtendedDataframe
that extends pandas.DataFrame
. I don't know how to go about this. So far, I'm not writing any __init__
in my new class so that it's inherited from pandas.DataFrame
:
import pandas
class ExtendedDataframe(pandas.DataFrame):
def some_method(self):
blahblah
This apparently enables me to create an instance of ExtendedDataframe
by inheritance. But I'm usually loading data through something like pandas.read_csv
which returns a classic DataFrame
. How can I do to be able to load such csv data and at some point turn it into an ExtendedDataframe
to use my own methods, on top of those provided on standard DataFrame
? It's fine if the loading phase returns a standard DataFrame
that I then transform into an ExtendedDataframe
.
Second question:
Not all pandas' functionalities that I use are DataFrame methods. Some are functions, such as pandas.merge
, that take DataFrames as arguments. How can I extend the use of such functions to instances of my ExtendedDataframe
class? In otherwords, if df1
and df2
are two instances of ExtendedDataframe
, how do I make
pandas.merge([df1, df2], ...)
work just like it would with standard instances of DataFrame
?
Upvotes: 6
Views: 11176
Reputation: 1
You can extend your DataFrame class by assigning a function to the DataFrame class variable.
For example:
def my_custom_class():
print('hi')
pd.DataFrame.my_custom_class = my_custom_class
Upvotes: 0
Reputation: 72
You could extend the constructor like this:
import pandas
from datetime import datetime
class ExtendedDataframe(pandas.DataFrame):
def __init__(self, *args, **kwargs):
pandas.DataFrame.__init__(self, *args, **kwargs)
self.created_at = datetime.today()
def to_csv(self, *args, **kwargs):
copy = self.copy()
copy["created_at"] = self.created_at
pd.DataFrame.to_csv(copy, *args, **kwargs)
Upvotes: 0
Reputation: 785
Had the same problem today, with a colleagues help I found out that this works:
import pandas as pd
class MyDF(pd.DataFrame):
def __init__(self, *args, **kwargs):
super(MyDF, self).__init__(*args, **kwargs)
@property
def _constructor(self):
return MyDF
def my_custom_method(self):
print('This actually works!')
Example:
df = MyDF(columns=('a', 'b'))
df = df.append({'a': 1, 'b': 'test'})
print(df)
df.my_custom_method() # prints "This actually works!"
Upvotes: 9
Reputation: 1441
I am not sure in which version of Pandas decorators for extending DataFrame, etc. were introduced. You can read more about it on the following address: https://pandas.pydata.org/pandas-docs/stable/development/extending.html
Upvotes: 3
Reputation: 12417
When you create instance of your dataframe, they are DataFrame object. You can modify existing methods overriding them in this way ____existingMethod____ About the second questions, I would suggest you to create a new class in which you pass the 2 dataframes. In this case you will have to make the ____init____ method
Upvotes: 0
Reputation: 1827
This doesn't directly answer your question but it is a potential answer to your problem. Lot's of people use the pipe method in their workflows.
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.pipe.html
Instead of saying
df = foo(df)
you can say
df = df.pipe(foo)
You can even specify arguments for the function! This will be much easier to maintain than trying to encapsulate the whole dataframe class. So the idea is that you can just create a library of functions and pipe them as needed.
Upvotes: 9