Long_NgV
Long_NgV

Reputation: 475

Sum and count values by group

I have this dataset with group and sales for period 1 and 2:

index = [1, 2, 3, 4, 5, 6]
a = ['A', 'B', 'A', 'B', 'C', 'B']
b = ['C', 'A', 'B', 'B', 'C', 'B']
c = [300, 109, 441, 375, 243, 340]
d = [359, 244, 155, 241, 429, 166]
df = pd.DataFrame({'ID': index, 'Group_1': a, 'Group_2': b, 'sales_1': c,'sales_2':d})

I want to count the number of customers in each group with the sales average in that period. The result should look like this (I do it in Excel):

  Group  count_1  avg_sales_1  count_2  avg_sales_2
0     A        2      385.500        1      244.000
1     B        3      274.666        3      187.333
2     C        1      243.000        2      394.000

I have tried these code, however I could only do with one period each time:

df.groupby(['Group_1']).agg({'ID':'count', 'sales_1':'mean'})
df.groupby(['Group_2']).agg({'ID':'count', 'sales_2':'mean'})

So how do I merge these two group as the result? Are there any faster way to do this task?

Upvotes: 1

Views: 1348

Answers (2)

jezrael
jezrael

Reputation: 863741

I think if only few groups, your soluton is nice. Only last use concat with DataFrame.rename_axis and DataFrame.reset_index:

df1 = df.groupby(['Group_1']).agg({'ID':'count', 'sales_1':'mean'})
df2 = df.groupby(['Group_2']).agg({'ID':'count', 'sales_2':'mean'})
df = pd.concat([df1, df2], axis=1).rename_axis('Group').reset_index()
print (df)
  Group  ID     sales_1  ID     sales_2
0     A   2  370.500000   1  244.000000
1     B   3  274.666667   3  187.333333
2     C   1  243.000000   2  394.000000

If use pandas 0.25+ is possible use named aggregation:

df1 = df.groupby(['Group_1']).agg(count_1=pd.NamedAgg(column='ID', aggfunc='count'),
                                  avg_sales_1=pd.NamedAgg(column='sales_1', aggfunc='mean'))

df2 = df.groupby(['Group_2']).agg(count_2=pd.NamedAgg(column='ID', aggfunc='count'),
                                  avg_sales_2=pd.NamedAgg(column='sales_2', aggfunc='mean'))

df = pd.concat([df1, df2], axis=1).rename_axis('Group').reset_index()
print (df)
  Group  count_1  avg_sales_1  count_2  avg_sales_2
0     A        2   370.500000        1   244.000000
1     B        3   274.666667        3   187.333333
2     C        1   243.000000        2   394.000000

If many groups then solution is more complicated - first reshape by wide_to_long, then aggregate by GroupBy.agg and last reshape by DataFrame.unstack:

df = (pd.wide_to_long(df, 
                     stubnames=['Group','sales'],
                     i=['ID'],
                     j='g',
                     sep='_')).reset_index()

df = (df.groupby(['Group','g'])['sales'].agg([('count', 'count'), ('avg','mean')])
       .unstack()
       .sort_index(level=1, axis=1))
df.columns = [f'{a}_{b}' for a, b in df.columns]
df = df.reset_index()
print (df)
  Group       avg_1  count_1       avg_2  count_2
0     A  370.500000        2  244.000000        1
1     B  274.666667        3  187.333333        3
2     C  243.000000        1  394.000000        2

Upvotes: 2

abheet22
abheet22

Reputation: 470

Try this:

In [33]: df1 = df.groupby(['Group_1']).agg({'ID':'count', 'sales_1':'mean'})

In [34]: df2 = df.groupby(['Group_2']).agg({'ID':'count', 'sales_2':'mean'})

In [35]: pd.concat([df1,df2], axis=1)
Out[35]:
   ID     sales_1  ID     sales_2
A   2  370.500000   1  244.000000
B   3  274.666667   3  187.333333
C   1  243.000000   2  394.000000

Upvotes: 1

Related Questions