Reputation: 395
I have a dataframe with duplicates, and I would like to sum the values of a column based on the duplicates of two other columns, like describes in the picture bellow. I could iterate over the sets of column A and B and then sum each time, but probably there is a more efficient way to do it.
#To create the example dataframe
df1 = pd.DataFrame({'A': ['Spain', 'France','Spain','France','France','France','Spain',], 'B': ['Meat', 'Meat','Meat','Drink','Meat','Drink','Drink'], 'C': ['10','20','30','10','20','30','10']})
Any help would be appreciated.
Upvotes: 2
Views: 2058
Reputation: 3299
It sounds like Pandas groupby is what you need. This will set the index to the two columns, so you'll need to reset the index after grouping the data. You will also need to change the data type of your 'C' column to an integer, as it is currently strings (text) and will not sum.
df1 = pd.DataFrame({'A': ['Spain', 'France','Spain','France','France','France','Spain',],
'B': ['Meat', 'Meat','Meat','Drink','Meat','Drink','Drink'],
'C': ['10','20','30','10','20','30','10']})
df1['C'] = df1['C'].astype(int)
df1.groupby(['A', 'B']).sum().reset_index()
this will return the below as desired:
A B C
0 France Drink 10
1 France Meat 40
2 Spain Meat 40
Upvotes: 0
Reputation: 195613
Try:
df1["C"] = df1["C"].astype(int)
print(df1.groupby(["A", "B"], as_index=False).sum())
Prints:
A B C
0 France Drink 40
1 France Meat 40
2 Spain Drink 10
3 Spain Meat 40
df1
used:
A B C
0 Spain Meat 10
1 France Meat 20
2 Spain Meat 30
3 France Drink 10
4 France Meat 20
5 France Drink 30
6 Spain Drink 10
Upvotes: 1