SarahD
SarahD

Reputation: 91

How do I pass a pandas method as a parameter?

I have a function which calculates the mode of columns of a pandas dataframe:

def my_func(df):
    for col in df.columns:
        stat = df[col].mode()
        print(stat)

But I would like to make it more generic so that I can change which statistic I calculate e.g. mean, max,... I tried to pass the method mode() as an argument to my function:

def my_func(df, pandas_stat):
    for col in df.columns:
        stat = df[col].pandas_stat()
        print(stat)

having referred to: How do I pass a method as a parameter in Python

However this doesn't seem to work for me. Using a simple example:

> A
     a    b
0  1.0  2.0
1  2.0  4.0
2  2.0  6.0
3  3.0  NaN
4  NaN  4.0
5  3.0  NaN
6  2.0  6.0
7  4.0  6.0

It doesn't recognise the command mode:

> my_func(A, mode)
Traceback (most recent call last):

  File "<ipython-input-332-c137de83a530>", line 1, in <module>
    my_func(A, mode)

NameError: name 'mode' is not defined

so I tried pd.DataFrame.mode:

> my_func(A, pd.DataFrame.mode)
Traceback (most recent call last):

  File "<ipython-input-334-dd913410abd0>", line 1, in <module>
    my_func(A, pd.DataFrame.mode)

  File "<ipython-input-329-8acf337bce92>", line 3, in my_func
    stat = df[col].pandas_stat()

  File "/anaconda3/envs/py36/lib/python3.6/site-packages/pandas/core/generic.py", line 4376, in __getattr__
    return object.__getattribute__(self, name)

AttributeError: 'Series' object has no attribute 'pandas_stat'

Is there a way to pass the mode function?

Upvotes: 2

Views: 2361

Answers (2)

Mikhail Stepanov
Mikhail Stepanov

Reputation: 3790

You can use [getattr][1] built-in and __name__ attribute to do so, but I guess it makes your code somewhat unclear. May be a better approach exists.

df = pd.DataFrame({'col1': list(range(5)), 'col2': list(range(5, 0, -1))})
df
Out:
   col1  col2
0     0     5
1     1     4
2     2     3
3     3     2
4     4     1

Define my_func this way and apply it to df:

def my_func(df, pandas_stat):
    for col in df.columns:
        stat = getattr(df[col], pandas_stat.__name__)()
        print(stat)

my_func(df, pd.DataFrame.mean)
Out
2.0
3.0

Explanation: pd.DataFrame.mean has attribute __name__ which value is 'mean'. Getattr can get this attribute from pd.DataFrame object, than you can call it.

You can even pass an arguments, if you need it:

def my_func(df, pandas_stat, *args, **kwargs):
    for col in df.columns:
        stat = getattr(df[col], pandas_stat.__name__)(*args, **kwargs)
        print(stat)

my_func(df, pd.DataFrame.apply, lambda x: x ** 2)
Out: 
0     0
1     1
2     4
3     9
4    16
Name: col1, dtype: int64
0    25
1    16
2     9
3     4
4     1
Name: col2, dtype: int64

But I repeat, I guess this approach is a little confusing.

Edit
About an error:

> my_func(A, pd.DataFrame.mode)
Traceback (most recent call last):

  File "<ipython-input-334-dd913410abd0>", line 1, in <module>
    my_func(A, pd.DataFrame.mode)

  File "<ipython-input-329-8acf337bce92>", line 3, in my_func
    stat = df[col].pandas_stat()

  File "/anaconda3/envs/py36/lib/python3.6/site-packages/pandas/core/generic.py", line 4376, in __getattr__
    return object.__getattribute__(self, name)

AttributeError: 'Series' object has no attribute 'pandas_stat'

When df[col].pandas_stat() is executed, a dot . operator invokes a __getattribute__ method of dataframe object. It is an analog of a getattr, but it gets self as a first argument automaticly.
So, the second is the 'name' of a method, which is 'pandas_stat' in your code. It breaks down the execution, because pandas dataframe has no attribute with a such name.

If you provide correct name of actual method ('mean', 'apply' or so) to the getattr, this function find this method in pd.DataFrame.__dict__ where all the methods are listed, and return it. So you can call it via (*args, **kwargs) syntax.

Upvotes: 6

Bubastis
Bubastis

Reputation: 102

You can do this with getattr:

def my_func(df, pandas_stat):
    for col in df.columns:
        print(getattr(df[col], pandas_stat)())  # the empty parenthesis
                                                       # are required to call
                                                       # the method

df_max = my_func(df, "max")

Upvotes: 3

Related Questions