Reputation: 175
I have the following df, which I would like to group by 'Name' so there is an 'A' and 'B' count column and a 'total sales' sum column:
eg turn this:
data = {'A or B' : ['A','A','B','B','A','B'],
'Name' : ['Ben','Ben','Ben','Sam','Sam','Sam'],
'Sales ($)' : [10,5,2,5,6,7]
}
df=pd.DataFrame(data, columns = ['A or B','Name','Sales ($)'])
so it looks like this:
grouped_data = {'A' : [2,1],
'B' : [1,2],
'Name' : ['Ben','Sam'],
'Total Sales ($)' : [17,18]
}
df=pd.DataFrame(grouped_data, columns = ['A','B','Name','Total Sales ($)'])
Upvotes: 3
Views: 539
Reputation: 2811
You can work with aggregations inside groupby
df.groupby(['Name']).agg(A = ('A or B', lambda x: (x=='A').sum())
,B = ('A or B', lambda x: (x=='B').sum())
,total = ('Sales ($)', 'sum')).reset_index()
#output
Name A B total
0 Ben 2 1 17
1 Sam 1 2 18
Upvotes: 1
Reputation: 6483
You can try with pd.get_dummies
, join
and groupby
+sum
:
pd.get_dummies(df['A or B'])\
.join(df.drop('A or B',1))\
.groupby('Name',as_index=False).sum()
Output:
Name A B Sales ($)
0 Ben 2 1 17
1 Sam 1 2 18
Details:
First, use get_dummies
to get categorical variable into dummy/indicator variables:
pd.get_dummies(df['A or B'])
# A B
#0 1 0
#1 1 0
#2 0 1
#3 0 1
#4 1 0
#5 0 1
Then use join, to concat the dummies with original df with 'A or B'
column dropped:
pd.get_dummies(df['A or B']).join(df.drop('A or B',1))
# A B Name Sales ($)
#0 1 0 Ben 10
#1 1 0 Ben 5
#2 0 1 Ben 2
#3 0 1 Sam 5
#4 1 0 Sam 6
#5 0 1 Sam 7
And finally, do the groupby
+sum
based on name:
pd.get_dummies(df['A or B']).join(df.drop('A or B',1)).groupby('Name',as_index=False).sum()
# Name A B Sales ($)
#0 Ben 2 1 17
#1 Sam 1 2 18
Upvotes: 3
Reputation: 61
Step by step solution:
import pandas as pd
data = {'A or B' : ['A','A','B','B','A','B'],
'Name' : ['Ben','Ben','Ben','Sam','Sam','Sam'],
'Sales ($)' : [10,5,2,5,6,7]
}
df=pd.DataFrame(data, columns = ['A or B','Name','Sales ($)'])
#first create dummy for 'A or B' column
y = pd.get_dummies(df['A or B'])
#concatenate with original data frame
df=pd.concat([y,df], axis=1)
#delete the column
del df['A or B']
#now do the group by
df=df.groupby('Name').agg({'A':'sum',
'B':'sum',
'Sales ($)': 'sum'})
#reset the index
df.reset_index(level=0, inplace=True)
print(df)
Output:
Name A B Sales ($)
0 Ben 2 1 17
1 Sam 1 2 18
Upvotes: 0