Reputation: 10033
I have a dataframe, with columns:
cols = ['A', 'B', 'C']
If I groupby one column, say, 'A', like so:
df.groupby('A')['B'].mean()
It works.
But I need to groupby one column and then get the mean of all other columns. I've tried:
df[cols].groupby('A').mean()
But I get the error:
KeyError: 'A'
What am I missing?
Upvotes: 3
Views: 2114
Reputation: 516
Perhaps the missing column is string rather than numeric?
df = pd.DataFrame({
'A': ['big', 'small','small', 'small'],
'B': [1,0,0,0],
'C': [1,1,1,0],
'D': ['1','0','0','0']
})
df.groupby(['A']).mean()
Output:
A | B | C |
---|---|---|
big | 1.0 | 1.0 |
small | 0.0 | 0.6666666666666666 |
Here, converting the column to a numeric type such as int
or float
produces the desired result:
df.D = df.D.astype(int)
df.groupby(['A']).mean()
Output:
A | B | C | D |
---|---|---|---|
big | 1.0 | 1.0 | 1.0 |
small | 0.0 | 0.6666666666666666 | 0.0 |
Upvotes: 0
Reputation: 17814
You can use df.groupby('col').mean()
. For example to calcualte mean
for columns 'A'
, 'B'
and 'C'
:
A B C D
0 1 NaN 1 1
1 1 2.0 2 1
2 2 3.0 1 1
3 1 4.0 1 1
4 2 5.0 2 1
df[['A', 'B', 'C']].groupby('A').mean()
or
df.groupby('A')[['A', 'B', 'C']].mean()
Output:
B C
A
1 3.0 1.333333
2 4.0 1.500000
If you need mean for all columns:
df.groupby('A').mean()
Output:
B C D
A
1 3.0 1.333333 1.0
2 4.0 1.500000 1.0
Upvotes: 0
Reputation: 26676
Please try:
df.groupby('A').agg('mean')
sample data
B C A
0 1 4 K
1 2 6 S
2 4 7 K
3 6 3 K
4 2 1 S
5 7 3 K
6 8 9 K
7 9 3 K
print(df.groupby('A').agg('mean'))
B C
A
K 5.833333 4.833333
S 2.000000 3.500000
Upvotes: 1