user3222101
user3222101

Reputation: 1330

How to apply different aggregate functions to different columns in pandas?

I have the dataframe with many columns in it , some of it contains price and rest contains volume as below:

year_month   0_fx_price_gy 0_fx_volume_gy 1_fx_price_yuy 1_fx_volume_yuy
1990-01      2             10             3              30
1990-01      2             20             2              40
1990-02      2             30             3              50

I need to do group by year_month and do mean on price columns and sum on volume columns.

is there any quick way to do this in one statement like do average if column name contains price and sum if it contains volume?

df.groupby('year_month').?

Note: this is just sample data with less columns but format is similar

output

year_month   0_fx_price_gy 0_fx_volume_gy 1_fx_price_yuy 1_fx_volume_yuy
1990-01      2             30             2.5              70
1990-02      2             30             3                50

Upvotes: 1

Views: 113

Answers (1)

jezrael
jezrael

Reputation: 862771

Create dictionary by matched values and pass to DataFrameGroupBy.agg, last add reindex if order of output columns is changed:

d1 = dict.fromkeys(df.columns[df.columns.str.contains('price')], 'mean')
d2 = dict.fromkeys(df.columns[df.columns.str.contains('volume')], 'sum')

#merge dicts together
d = {**d1, **d2}
print (d)
{'0_fx_price_gy': 'mean', '1_fx_price_yuy': 'mean',
 '0_fx_volume_gy': 'sum', '1_fx_volume_yuy': 'sum'}

Another solution for dictionary:

d = {}
for c in df.columns:
    if 'price' in c:
        d[c] = 'mean'
    if 'volume' in c:
        d[c] = 'sum'

And solution should be simplify if only price and volume columns without first column filtered out by df.columns[1:]:

d = {x:'mean' if 'price' in x else 'sum' for x in df.columns[1:]}

df1 = df.groupby('year_month', as_index=False).agg(d).reindex(columns=df.columns)
print (df1)
  year_month  0_fx_price_gy  0_fx_volume_gy  1_fx_price_yuy  1_fx_volume_yuy
0    1990-01              2              40               3               60
1    1990-02              2              20               3               30

Upvotes: 1

Related Questions