Reputation: 2644
I have a dataframe like this:
animals = pd.DataFrame({'kind': ['cat', 'dog', 'cat', 'dog'],
'height': [9.1, 6.0, 9.5, 34.0],
'weight': [7.9, 7.5, 9.9, 198.0]})
I would like to groupby and apply some aggregation function several times. The number of times the function is run and parameters with which it will run should be dynamic (the output should depend on list of parameters).
Example:
Lets say i want to group by kind and calculate mean of height, mean of height + 1, and mean of height + 2, then i could run:
parameters = [0,1,2]
animals.groupby(['kind']).agg(
mean_height = ('height', lambda x: x.mean() + parameters[0]),
mean_height_plus_1 = ('height', lambda x: x.mean() + parameters[1]),
mean_height_plus_2 = ('height', lambda x: x.mean() + parameters[2]))
however, this requeires me to know in advance the length of the list of parameters. But i'd like to change my mind later and do the same for parameters = [0,1,2,359]
, without having to change manually the code to this:
animals.groupby(['kind']).agg(
mean_height = ('height', lambda x: x.mean() + parameters[0]),
mean_height_plus_1 = ('height', lambda x: x.mean() + parameters[1]),
mean_height_plus_2 = ('height', lambda x: x.mean() + parameters[2]),
mean_height_plus_359 = ('height', lambda x: x.mean() + parameters[3]))
Upvotes: 1
Views: 231
Reputation: 150785
You can, for example, define a function with params
and apply:
def get_mean(x, params):
return pd.Series(x.mean() + np.array(params),
index = [f'mean_plus_{i}' for i in params])
animals.groupby('kind').apply(get_mean, parameters)
Output:
kind
cat mean_plus_0 9.3
mean_plus_1 10.3
mean_plus_2 11.3
dog mean_plus_0 20.0
mean_plus_1 21.0
mean_plus_2 22.0
Name: height, dtype: float64
or you can do a for
loop:
groups = animals.groupby('kind')
ret_df = pd.DataFrame()
for i in parameters:
ret_df[f'mean_plus_{i}'] = groups['height'].mean() + i
Output:
mean_plus_0 mean_plus_1 mean_plus_2
kind
cat 9.3 10.3 11.3
dog 20.0 21.0 22.0
Upvotes: 1