Reputation: 1019
I have a large DataFrame along the lines of the one below, and I am looking to calculate either mean or mode on particular columns. I am looking to groupby a feature (whether it is on or off) and also an ID column (1 or 2).
MY_DF
Feature ID ITEM1 ITEM2 ITEM3
index
1 ON 1 2.3 544 32.2
2 OFF 1 2.55 544 23.4
3 ON 2 8.32 674 15.5
4 ON 1 1.56 544 43.4
5 OFF 2 3.55 674 32.4
I am able to do both tasks separately to find mean and mode for particular columns, but I can't work out how to join the two outputs together, but then I was also wondering if there is a way to calculate mean or mode of selected columns in one hit?
In my current method, although I am able to do both tasks separetely I am struggling to join the two outputs together.
Here's my separate methods
#as mode is not an official function in Pandas, set out method here
from scipy.stats.mstats import mode
f = lambda x: mode(x, axis=None)[0]
means = MY_DF.groupby(['Feature','ID'])[ITEM1,ITEM3].mean()
moder = MY_DF.groupby(['Feature','ID'])[ITEM2].apply(f)
My mean method outputs something like this:
ITEM1 ITEM3
ON 1 1.93 37.8
2 8.32 15.5
OFF 1 2.55 23.4
2 3.55 32.4
My mode method outputs something like this:
ITEM2
ON 1 [544]
2 [674]
OFF 1 [544]
2 [674]
Im looking for a final output something like this if possible?
ITEM1 ITEM3 ITEM2
ON 1 1.93 37.8 544
2 8.32 15.5 674
OFF 1 2.55 23.4 544
2 3.55 32.4 674
Upvotes: 0
Views: 6747
Reputation: 879729
You can pass a dict to GroupBy.agg
. The keys refer to columns of the DataFrame and the value specifies the aggregation function. For example,
import pandas as pd
import scipy.stats.mstats as mstats
def mode(x):
return mstats.mode(x, axis=None)[0]
df = pd.DataFrame({'Feature': ['ON', 'OFF', 'ON', 'ON', 'OFF'],
'ID': [1, 1, 2, 1, 2],
'ITEM1': [2.3, 2.55, 8.32, 1.56, 3.55],
'ITEM2': [544, 544, 674, 544, 674],
'ITEM3': [32.2, 23.4, 15.5, 43.4, 32.4],})
df.groupby(['Feature','ID']).agg({'ITEM1':'mean','ITEM3':'mean','ITEM2':mode})
print(df)
yields
Feature ID ITEM1 ITEM2 ITEM3
0 ON 1 2.30 544 32.2
1 OFF 1 2.55 544 23.4
2 ON 2 8.32 674 15.5
3 ON 1 1.56 544 43.4
4 OFF 2 3.55 674 32.4
Upvotes: 2