Reputation: 91
I have a function which calculates the mode of columns of a pandas dataframe:
def my_func(df):
for col in df.columns:
stat = df[col].mode()
print(stat)
But I would like to make it more generic so that I can change which statistic I calculate e.g. mean, max,... I tried to pass the method mode() as an argument to my function:
def my_func(df, pandas_stat):
for col in df.columns:
stat = df[col].pandas_stat()
print(stat)
having referred to: How do I pass a method as a parameter in Python
However this doesn't seem to work for me. Using a simple example:
> A
a b
0 1.0 2.0
1 2.0 4.0
2 2.0 6.0
3 3.0 NaN
4 NaN 4.0
5 3.0 NaN
6 2.0 6.0
7 4.0 6.0
It doesn't recognise the command mode:
> my_func(A, mode)
Traceback (most recent call last):
File "<ipython-input-332-c137de83a530>", line 1, in <module>
my_func(A, mode)
NameError: name 'mode' is not defined
so I tried pd.DataFrame.mode:
> my_func(A, pd.DataFrame.mode)
Traceback (most recent call last):
File "<ipython-input-334-dd913410abd0>", line 1, in <module>
my_func(A, pd.DataFrame.mode)
File "<ipython-input-329-8acf337bce92>", line 3, in my_func
stat = df[col].pandas_stat()
File "/anaconda3/envs/py36/lib/python3.6/site-packages/pandas/core/generic.py", line 4376, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'pandas_stat'
Is there a way to pass the mode function?
Upvotes: 2
Views: 2361
Reputation: 3790
You can use [getattr][1]
built-in and __name__
attribute to do so, but I guess it makes your code somewhat unclear. May be a better approach exists.
df = pd.DataFrame({'col1': list(range(5)), 'col2': list(range(5, 0, -1))})
df
Out:
col1 col2
0 0 5
1 1 4
2 2 3
3 3 2
4 4 1
Define my_func
this way and apply it to df
:
def my_func(df, pandas_stat):
for col in df.columns:
stat = getattr(df[col], pandas_stat.__name__)()
print(stat)
my_func(df, pd.DataFrame.mean)
Out
2.0
3.0
Explanation: pd.DataFrame.mean
has attribute __name__
which value is 'mean'
. Getattr can get this attribute from pd.DataFrame
object, than you can call it.
You can even pass an arguments, if you need it:
def my_func(df, pandas_stat, *args, **kwargs):
for col in df.columns:
stat = getattr(df[col], pandas_stat.__name__)(*args, **kwargs)
print(stat)
my_func(df, pd.DataFrame.apply, lambda x: x ** 2)
Out:
0 0
1 1
2 4
3 9
4 16
Name: col1, dtype: int64
0 25
1 16
2 9
3 4
4 1
Name: col2, dtype: int64
But I repeat, I guess this approach is a little confusing.
Edit
About an error:
> my_func(A, pd.DataFrame.mode)
Traceback (most recent call last):
File "<ipython-input-334-dd913410abd0>", line 1, in <module>
my_func(A, pd.DataFrame.mode)
File "<ipython-input-329-8acf337bce92>", line 3, in my_func
stat = df[col].pandas_stat()
File "/anaconda3/envs/py36/lib/python3.6/site-packages/pandas/core/generic.py", line 4376, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'pandas_stat'
When df[col].pandas_stat()
is executed, a dot .
operator invokes a __getattribute__
method of dataframe object. It is an analog of a getattr
, but it gets self
as a first argument automaticly.
So, the second is the 'name' of a method, which is 'pandas_stat'
in your code. It breaks down the execution, because pandas dataframe has no attribute with a such name.
If you provide correct name of actual method ('mean', 'apply' or so) to the getattr
, this function find this method in pd.DataFrame.__dict__
where all the methods are listed, and return it. So you can call it via (*args, **kwargs)
syntax.
Upvotes: 6
Reputation: 102
You can do this with getattr
:
def my_func(df, pandas_stat):
for col in df.columns:
print(getattr(df[col], pandas_stat)()) # the empty parenthesis
# are required to call
# the method
df_max = my_func(df, "max")
Upvotes: 3