BMichell
BMichell

Reputation: 3771

Pass Additional Arguments to Pandas Custom Accessor

I am writing a custom accessor class for a Pandas Dataframe - I have followed the examples here and achieved positive results. However I have a function where I would like to pass additional arguments.

I have created this function within my accessor class:

    @property
    def accessor_function(self, time_window=0.5):
        def group_function(df, time):
            fl = df.loc[df.Type_num==0]
            id = fl.Time.idxmin() 
            threshold = df.loc[id, 'column'] + time
        return fl.loc[fl.Time<threshold]

    self.Subset = self._obj.groupby(by['col_1','col_2']).apply(group_function, time_window)
    self.Subset.reset_index(drop=True, inplace=True)

    return self.Subset

If I call this like this it works using time_window=0.5:

df.accessor.accessor_function

However if I want to pass a different value for the keyword argument:

df.accessor.accessor_function(time_window = 1)

I get an error:

TypeError: 'DataFrame' object is not callable

I can't find any obvious documentation explaining passing args or kwargs to custom accessors. So I'm not sure if what I'm attempting is even possible. But it would be good to understand how to move forward.

Ben

Upvotes: 3

Views: 831

Answers (1)

user7440787
user7440787

Reputation: 841

I believe it has to do with the fact that you are using the porperty decorator when actually you have a method. If you remove that, it should work fine, see example below:

import pandas as pd
@pd.api.extensions.register_dataframe_accessor("accessor")
class MyAccessor:
    def __init__(self, pandas_obj):
        self._obj = pandas_obj

    def accessor_function(self, time_window=0.5):
        def group_function(df, time):
            fl = df.loc[df.Type_num==0]
            id = fl.Time.idxmin() 
            threshold = df.loc[id, 'column'] + time
            return fl.loc[fl.Time<threshold]

        self.Subset = self._obj.groupby(['col_1','col_2']).apply(group_function, time_window)
        self.Subset.reset_index(drop=True, inplace=True)

        return self.Subset

The default case is:

>>> a = pd.DataFrame({'Type_num': [False, False,False,False,False], 
                      'Time': [1, 2, 0.1, 0.2, 0.5],
                      'col_1': ['A', 'B', 'C', 'D', 'E'], 
                      'col_2': ['A', 'A', 'C', 'E', 'E'],
                      'column': [0.2, 0.2,0.2, 0.2,0.2]})
>>> a.accessor.accessor_function()
   Type_num  Time col_1 col_2  column
0     False   0.1     C     C     0.2
1     False   0.2     D     E     0.2
2     False   0.5     E     E     0.2

You can use a custom time_window

>>> a.accessor.accessor_function(time_window=1)
   Type_num  Time col_1 col_2  column
0     False   1.0     A     A     0.2
1     False   0.1     C     C     0.2
2     False   0.2     D     E     0.2
3     False   0.5     E     E     0.2

Or pass that parameter using *arg or **kwargs:

>>> a.accessor.accessor_function(*[2])
   Type_num  Time col_1 col_2  column
0     False   1.0     A     A     0.2
1     False   2.0     B     A     0.2
2     False   0.1     C     C     0.2
3     False   0.2     D     E     0.2
4     False   0.5     E     E     0.2

>>> a.accessor.accessor_function(**{'time_window':0.1})
   Type_num  Time col_1 col_2  column
0     False   0.1     C     C     0.2
1     False   0.2     D     E     0.2

Upvotes: 5

Related Questions