baxx
baxx

Reputation: 4755

How to write a python function that can be used with pandas method chaining

This is my initial approach:

In [91]: def f(dataframe,col):
    ...:     dataframe[col] = dataframe[col]*0

But this failed with the following:

In [90]: df=pd.DataFrame({'a':[1,2],'b':[4,5]})

In [91]: def f(dataframe,col):
    ...:     dataframe[col] = dataframe[col]*0
    ...:

In [92]: df.f('a')
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-92-e1a104c6b712> in <module>
----> 1 df.f('a')

~/.virtualenvs/this-env/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name)
   5177             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5178                 return self[name]
-> 5179             return object.__getattribute__(self, name)
   5180
   5181     def __setattr__(self, name, value):

AttributeError: 'DataFrame' object has no attribute 'f'

I assumed that this would be fairly well documented, but I can't an example anywhere.

Upvotes: 0

Views: 141

Answers (1)

James
James

Reputation: 36791

What you are trying to do is called monkey-patching. You need to write the function as a method (it will have self as the first parameter) and then assign the method as an attribute to pd.DataFrame class, not the instantiated object.

import pandas as pd

def f(self, col):
    self.loc[:, col] = self.loc[:, col] * 0
    return self

pd.DataFrame.f = f

df=pd.DataFrame({'a':[1,2],'b':[4,5]})
df.f('a')
# returns:
   a  b
0  0  4
1  0  5

Keep in mind that your method as-written will modify the dataframe in-place. If you need to preserve the original dataframe, use .copy at the top of your function.

Upvotes: 2

Related Questions