Reputation: 934

How to apply function to ALL columns of dataframe GROUPWISELY ? (In python pandas)

How to apply a function to each column of dataframe "groupwisely" ? I.e. group by values of one column and calculate e.g. means for each group+ other columns. The expected output is dataframe with index - names of different groups, and values - means for each group+column

E.g. consider:

df = pd.DataFrame(np.arange(16).reshape(4,4), columns=['A', 'B', 'C', 'D'])
df['group'] = ['a', 'a', 'b','b']


    A   B   C   D   group
0   0   1   2   3   a
1   4   5   6   7   a
2   8   9   10  11  b
3   12  13  14  15  b

I want to calculate e.g. np.mean for each column, but "groupwisely", in that particular example it can be done by:

t = df.groupby('group').agg({'A': np.mean, 'B': np.mean, 'C': np.mean, 'D': np.mean })

    A   B   C   D
group               
a   2   3   4   5
b   10  11  12  13

However, it requires explicit use of column names 'A': np.mean, 'B': np.mean, 'C': np.mean, 'D': np.mean which is unacceptable for my task, since they can be changed.

Upvotes: 2

Answers (3)

jezrael

Reputation: 862661

As MaxU commented simplier is groupby + GroupBy.mean:

df1 = df.groupby('group').mean()
print (df1)
        A   B   C   D
group                
a       2   3   4   5
b      10  11  12  13

If need column from index:

df1 = df.groupby('group', as_index=False).mean()
print (df1)
  group   A   B   C   D
0     a   2   3   4   5
1     b  10  11  12  13

Upvotes: 2

asongtoruin

Reputation: 10359

You don't need to explicitly name the columns.

df.groupby('group').agg('mean')

Will produce the mean for each group for each column as requested:

        A   B   C   D
group                
a       2   3   4   5
b      10  11  12  13

Upvotes: 2

gented

Reputation: 1687

The below does the job:

df.groupby('group').apply(np.mean, axis=0)

giving back

          A     B     C     D
group                        
a       2.0   3.0   4.0   5.0
b      10.0  11.0  12.0  13.0

apply takes axis = {0,1} as additional argument, which in turn specifies whether you want to apply the function row-wise or column-wise.

Upvotes: 1

How to apply function to ALL columns of dataframe GROUPWISELY ? (In python pandas)

Answers (3)

Related Questions