newleaf
newleaf

Reputation: 2457

Pandas groupby aggregate apply multiple functions to multiple columns

Have a dataframe, need to apply same calculations for many columns, currently I'm doing it manually. Any good and elegant way to do this?

tt =  pd.DataFrame(data={'Status' : ['green','green','red','blue','red','yellow','black'],
 'Group' : ['A','A','B','C','A','B','C'],
 'City' : ['Toronto','Montreal','Vancouver','Toronto','Edmonton','Winnipeg','Windsor'],
 'Sales' : [13,6,16,8,4,3,1], 'Counts' : [100,200,50,30,20,10,300]})


ss = tt.groupby('Group').agg({'Sales':['count','mean',np.median],\
                              'Counts':['count','mean',np.median]})
ss.columns =  ['_'.join(col).strip() for col in ss.columns.values]

So the result is enter image description here

How could I do this for many columns with same calculations, count, mean, median for each column if I have a very large dataframe?

Upvotes: 3

Views: 306

Answers (1)

wwnde
wwnde

Reputation: 26686

In pandas, the agg operation takes single or multiple individual methods to be applied to relevant columns and returns a summary of the outputs. In python, lists hold and parse multiple entities. In this case, I pass a list of functions into the aggregator. In your case, you were parsing a dictionary, which means you had to handle each column individually making it very manual. Happy to explain further if not clear

ss=tt.groupby('Group').agg(['count','mean','median'])
ss.columns =  ['_'.join(col).strip() for col in ss.columns.values]
ss

Upvotes: 3

Related Questions