Pandas groupby: how to apply aggregation function for each parameter in list of parameters

Question

I have a dataframe like this:

animals = pd.DataFrame({'kind': ['cat', 'dog', 'cat', 'dog'],
                        'height': [9.1, 6.0, 9.5, 34.0],
                        'weight': [7.9, 7.5, 9.9, 198.0]})

I would like to groupby and apply some aggregation function several times. The number of times the function is run and parameters with which it will run should be dynamic (the output should depend on list of parameters).

Example:

Lets say i want to group by kind and calculate mean of height, mean of height + 1, and mean of height + 2, then i could run:

parameters = [0,1,2]

animals.groupby(['kind']).agg(
mean_height = ('height', lambda x: x.mean() + parameters[0]),
mean_height_plus_1 = ('height', lambda x: x.mean() + parameters[1]),
mean_height_plus_2 = ('height', lambda x: x.mean() + parameters[2]))

however, this requeires me to know in advance the length of the list of parameters. But i'd like to change my mind later and do the same for parameters = [0,1,2,359], without having to change manually the code to this:

animals.groupby(['kind']).agg(
    mean_height = ('height', lambda x: x.mean() + parameters[0]),
    mean_height_plus_1 = ('height', lambda x: x.mean() + parameters[1]),
    mean_height_plus_2 = ('height', lambda x: x.mean() + parameters[2]),
    mean_height_plus_359 = ('height', lambda x: x.mean() + parameters[3]))

Quang Hoang · Accepted Answer

You can, for example, define a function with params and apply:

def get_mean(x, params):
    return pd.Series(x.mean() + np.array(params),
                     index = [f'mean_plus_{i}' for i in params])

animals.groupby('kind').apply(get_mean, parameters)

Output:

kind             
cat   mean_plus_0     9.3
      mean_plus_1    10.3
      mean_plus_2    11.3
dog   mean_plus_0    20.0
      mean_plus_1    21.0
      mean_plus_2    22.0
Name: height, dtype: float64

or you can do a for loop:

groups = animals.groupby('kind')
ret_df = pd.DataFrame()

for i in parameters:
    ret_df[f'mean_plus_{i}'] = groups['height'].mean() + i

Output:

      mean_plus_0  mean_plus_1  mean_plus_2
kind                                       
cat           9.3         10.3         11.3
dog          20.0         21.0         22.0

Pandas groupby: how to apply aggregation function for each parameter in list of parameters

Answers (1)

Related Questions