Custom Aggregate Function in Python

Question

I have been struggling with a problem with custom aggregate function in Pandas that I have not been able to figure it out. let's consider the following data frame:

import numpy as np
import pandas as pd
df = pd.DataFrame({'value': np.arange(1, 5), 'weights':np.arange(1, 5)})

Now if, I want to calculate the the average of the value column using the agg in Panadas, it would be:

df.agg({'value': 'mean'})

which results in a scaler value of 2.5 as shown in the following:

However, if I define the following custom mean function:

def my_mean(vec):
    return np.mean(vec)

and use it in the following code:

df.agg({'value': my_mean})

I would get the following result:

So, the question here is, what should I do to get the same result as default mean aggregate function. One more thing to note that, if I use the mean function as a method in the custom function (shown below), it works just fine, however, I would like to know how to use np.mean function in my custom function. Any help would be much appreciated!

df my_mean2(vec):
   return vec.mean()

Mohammad Jafar Mashhadi · Accepted Answer

When you pass a callable as the aggregate function, if that callable is not one of the predefined callables like np.mean, np.sum, etc It'll treat it as a transform and acts like df.apply().

The way around it is to let pandas know that your callable expects a vector of values. A crude way to do it is to have sth like:

def my_mean(vals):
    print(type(vals))
    try:
        vals.shape
    except:
        raise TypeError()

    return np.mean(vals)

>>> df.agg({'value': my_mean})

 
value    2.5
dtype: float64

You see, at first pandas tries to call the function on each row (df.apply), but my_mean raises a type error and in the second attempt it'll pass the whole column as a Series object. Comment the try...except part out and you'll see my_mean will be called on each row with an int argument.

Custom Aggregate Function in Python

Answers (1)

Related Questions