Python Pandas add duplicate values based on columns

Question

I have a dataframe with duplicates, and I would like to sum the values of a column based on the duplicates of two other columns, like describes in the picture bellow. I could iterate over the sets of column A and B and then sum each time, but probably there is a more efficient way to do it.

#To create the example dataframe
df1 = pd.DataFrame({'A': ['Spain', 'France','Spain','France','France','France','Spain',], 'B': ['Meat', 'Meat','Meat','Drink','Meat','Drink','Drink'], 'C': ['10','20','30','10','20','30','10']})

Any help would be appreciated.

Andrej Kesely · Accepted Answer

Try:

df1["C"] = df1["C"].astype(int)
print(df1.groupby(["A", "B"], as_index=False).sum())

Prints:

        A      B   C
0  France  Drink  40
1  France   Meat  40
2   Spain  Drink  10
3   Spain   Meat  40

df1 used:

        A      B   C
0   Spain   Meat  10
1  France   Meat  20
2   Spain   Meat  30
3  France  Drink  10
4  France   Meat  20
5  France  Drink  30
6   Spain  Drink  10

Python Pandas add duplicate values based on columns

Answers (2)

Related Questions