Ben
Ben

Reputation: 509

How to extend the pandas' Dataframe class with my own methods and functions

First question:

I am working with pandas' DataFrames and I am frequently running the same routines as part of data pre-processing and other things. I'd like to write some of these routines as methods in a class called ExtendedDataframe that extends pandas.DataFrame. I don't know how to go about this. So far, I'm not writing any __init__ in my new class so that it's inherited from pandas.DataFrame:

import pandas
class ExtendedDataframe(pandas.DataFrame):
  def some_method(self):
    blahblah

This apparently enables me to create an instance of ExtendedDataframe by inheritance. But I'm usually loading data through something like pandas.read_csv which returns a classic DataFrame. How can I do to be able to load such csv data and at some point turn it into an ExtendedDataframe to use my own methods, on top of those provided on standard DataFrame? It's fine if the loading phase returns a standard DataFrame that I then transform into an ExtendedDataframe.

Second question:

Not all pandas' functionalities that I use are DataFrame methods. Some are functions, such as pandas.merge, that take DataFrames as arguments. How can I extend the use of such functions to instances of my ExtendedDataframe class? In otherwords, if df1 and df2 are two instances of ExtendedDataframe, how do I make

pandas.merge([df1, df2], ...)

work just like it would with standard instances of DataFrame?

Upvotes: 6

Views: 11176

Answers (6)

awales0177
awales0177

Reputation: 1

You can extend your DataFrame class by assigning a function to the DataFrame class variable.

For example:

def my_custom_class():
    print('hi')
pd.DataFrame.my_custom_class = my_custom_class

Upvotes: 0

ric
ric

Reputation: 72

You could extend the constructor like this:

import pandas
from datetime import datetime

class ExtendedDataframe(pandas.DataFrame):
  def __init__(self, *args, **kwargs):
    pandas.DataFrame.__init__(self, *args, **kwargs)
    self.created_at = datetime.today()

  def to_csv(self, *args, **kwargs):
    copy = self.copy()
    copy["created_at"] = self.created_at
    pd.DataFrame.to_csv(copy, *args, **kwargs)

Upvotes: 0

s6hebern
s6hebern

Reputation: 785

Had the same problem today, with a colleagues help I found out that this works:

import pandas as pd

class MyDF(pd.DataFrame):
    def __init__(self, *args, **kwargs):
        super(MyDF,  self).__init__(*args, **kwargs)

    @property
    def _constructor(self):
        return MyDF

    def my_custom_method(self):
        print('This actually works!')

Example:

df = MyDF(columns=('a', 'b'))
df = df.append({'a': 1, 'b': 'test'})
print(df)
df.my_custom_method()  # prints "This actually works!"

Upvotes: 9

Pero
Pero

Reputation: 1441

I am not sure in which version of Pandas decorators for extending DataFrame, etc. were introduced. You can read more about it on the following address: https://pandas.pydata.org/pandas-docs/stable/development/extending.html

Upvotes: 3

Joe
Joe

Reputation: 12417

When you create instance of your dataframe, they are DataFrame object. You can modify existing methods overriding them in this way ____existingMethod____ About the second questions, I would suggest you to create a new class in which you pass the 2 dataframes. In this case you will have to make the ____init____ method

Upvotes: 0

Gabriel A
Gabriel A

Reputation: 1827

This doesn't directly answer your question but it is a potential answer to your problem. Lot's of people use the pipe method in their workflows.

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.pipe.html

Instead of saying

df = foo(df)

you can say

df = df.pipe(foo)

You can even specify arguments for the function! This will be much easier to maintain than trying to encapsulate the whole dataframe class. So the idea is that you can just create a library of functions and pipe them as needed.

Upvotes: 9

Related Questions