Reputation: 2457
Have a dataframe, need to apply same calculations for many columns, currently I'm doing it manually. Any good and elegant way to do this?
tt = pd.DataFrame(data={'Status' : ['green','green','red','blue','red','yellow','black'],
'Group' : ['A','A','B','C','A','B','C'],
'City' : ['Toronto','Montreal','Vancouver','Toronto','Edmonton','Winnipeg','Windsor'],
'Sales' : [13,6,16,8,4,3,1], 'Counts' : [100,200,50,30,20,10,300]})
ss = tt.groupby('Group').agg({'Sales':['count','mean',np.median],\
'Counts':['count','mean',np.median]})
ss.columns = ['_'.join(col).strip() for col in ss.columns.values]
How could I do this for many columns with same calculations, count, mean, median for each column if I have a very large dataframe?
Upvotes: 3
Views: 306
Reputation: 26686
In pandas, the agg operation takes single or multiple individual methods to be applied to relevant columns and returns a summary of the outputs. In python, lists hold and parse multiple entities. In this case, I pass a list of functions into the aggregator. In your case, you were parsing a dictionary, which means you had to handle each column individually making it very manual. Happy to explain further if not clear
ss=tt.groupby('Group').agg(['count','mean','median'])
ss.columns = ['_'.join(col).strip() for col in ss.columns.values]
ss
Upvotes: 3