Reputation: 10043

Pandas - groupby one column and get mean of all other columns

I have a dataframe, with columns:

cols = ['A', 'B', 'C']

If I groupby one column, say, 'A', like so:

df.groupby('A')['B'].mean()

It works.

But I need to groupby one column and then get the mean of all other columns. I've tried:

df[cols].groupby('A').mean()

But I get the error:

KeyError: 'A'

What am I missing?

Upvotes: 3

Answers (3)

There

Reputation: 516

Perhaps the missing column is string rather than numeric?

df = pd.DataFrame({
  'A': ['big', 'small','small', 'small'],
  'B': [1,0,0,0],
  'C': [1,1,1,0],
  'D': ['1','0','0','0']
})
df.groupby(['A']).mean()

Output:

A	B	C
big	1.0	1.0
small	0.0	0.6666666666666666

Here, converting the column to a numeric type such as int or float produces the desired result:

df.D = df.D.astype(int)
df.groupby(['A']).mean()

Output:

A	B	C	D
big	1.0	1.0	1.0
small	0.0	0.6666666666666666	0.0

Upvotes: 0

Mykola Zotko

Reputation: 17911

You can use df.groupby('col').mean(). For example to calcualte mean for columns 'A', 'B' and 'C':

   A    B  C  D
0  1  NaN  1  1
1  1  2.0  2  1
2  2  3.0  1  1
3  1  4.0  1  1
4  2  5.0  2  1

df[['A', 'B', 'C']].groupby('A').mean()

df.groupby('A')[['A', 'B', 'C']].mean()

Output:

     B         C
A
1  3.0  1.333333
2  4.0  1.500000

If you need mean for all columns:

df.groupby('A').mean()

Output:

     B         C    D
A
1  3.0  1.333333  1.0
2  4.0  1.500000  1.0

Upvotes: 0

wwnde

Reputation: 26676

Please try:

df.groupby('A').agg('mean')

sample data

   B  C  A
0  1  4  K
1  2  6  S
2  4  7  K
3  6  3  K
4  2  1  S
5  7  3  K
6  8  9  K
7  9  3  K


print(df.groupby('A').agg('mean'))




     B         C
A                    
K  5.833333  4.833333
S  2.000000  3.500000

Upvotes: 1

Pandas - groupby one column and get mean of all other columns

Answers (3)

Related Questions