Reputation: 475
I have this dataset with group and sales for period 1 and 2:
index = [1, 2, 3, 4, 5, 6]
a = ['A', 'B', 'A', 'B', 'C', 'B']
b = ['C', 'A', 'B', 'B', 'C', 'B']
c = [300, 109, 441, 375, 243, 340]
d = [359, 244, 155, 241, 429, 166]
df = pd.DataFrame({'ID': index, 'Group_1': a, 'Group_2': b, 'sales_1': c,'sales_2':d})
I want to count the number of customers in each group with the sales average in that period. The result should look like this (I do it in Excel):
Group count_1 avg_sales_1 count_2 avg_sales_2
0 A 2 385.500 1 244.000
1 B 3 274.666 3 187.333
2 C 1 243.000 2 394.000
I have tried these code, however I could only do with one period each time:
df.groupby(['Group_1']).agg({'ID':'count', 'sales_1':'mean'})
df.groupby(['Group_2']).agg({'ID':'count', 'sales_2':'mean'})
So how do I merge these two group as the result? Are there any faster way to do this task?
Upvotes: 1
Views: 1348
Reputation: 863741
I think if only few groups, your soluton is nice. Only last use concat
with DataFrame.rename_axis
and DataFrame.reset_index
:
df1 = df.groupby(['Group_1']).agg({'ID':'count', 'sales_1':'mean'})
df2 = df.groupby(['Group_2']).agg({'ID':'count', 'sales_2':'mean'})
df = pd.concat([df1, df2], axis=1).rename_axis('Group').reset_index()
print (df)
Group ID sales_1 ID sales_2
0 A 2 370.500000 1 244.000000
1 B 3 274.666667 3 187.333333
2 C 1 243.000000 2 394.000000
If use pandas 0.25+ is possible use named aggregation:
df1 = df.groupby(['Group_1']).agg(count_1=pd.NamedAgg(column='ID', aggfunc='count'),
avg_sales_1=pd.NamedAgg(column='sales_1', aggfunc='mean'))
df2 = df.groupby(['Group_2']).agg(count_2=pd.NamedAgg(column='ID', aggfunc='count'),
avg_sales_2=pd.NamedAgg(column='sales_2', aggfunc='mean'))
df = pd.concat([df1, df2], axis=1).rename_axis('Group').reset_index()
print (df)
Group count_1 avg_sales_1 count_2 avg_sales_2
0 A 2 370.500000 1 244.000000
1 B 3 274.666667 3 187.333333
2 C 1 243.000000 2 394.000000
If many groups then solution is more complicated - first reshape by wide_to_long
, then aggregate by GroupBy.agg
and last reshape by DataFrame.unstack
:
df = (pd.wide_to_long(df,
stubnames=['Group','sales'],
i=['ID'],
j='g',
sep='_')).reset_index()
df = (df.groupby(['Group','g'])['sales'].agg([('count', 'count'), ('avg','mean')])
.unstack()
.sort_index(level=1, axis=1))
df.columns = [f'{a}_{b}' for a, b in df.columns]
df = df.reset_index()
print (df)
Group avg_1 count_1 avg_2 count_2
0 A 370.500000 2 244.000000 1
1 B 274.666667 3 187.333333 3
2 C 243.000000 1 394.000000 2
Upvotes: 2
Reputation: 470
Try this:
In [33]: df1 = df.groupby(['Group_1']).agg({'ID':'count', 'sales_1':'mean'})
In [34]: df2 = df.groupby(['Group_2']).agg({'ID':'count', 'sales_2':'mean'})
In [35]: pd.concat([df1,df2], axis=1)
Out[35]:
ID sales_1 ID sales_2
A 2 370.500000 1 244.000000
B 3 274.666667 3 187.333333
C 1 243.000000 2 394.000000
Upvotes: 1