Babas
Babas

Reputation: 395

Python Pandas add duplicate values based on columns

I have a dataframe with duplicates, and I would like to sum the values of a column based on the duplicates of two other columns, like describes in the picture bellow. I could iterate over the sets of column A and B and then sum each time, but probably there is a more efficient way to do it.

enter image description here

#To create the example dataframe
df1 = pd.DataFrame({'A': ['Spain', 'France','Spain','France','France','France','Spain',], 'B': ['Meat', 'Meat','Meat','Drink','Meat','Drink','Drink'], 'C': ['10','20','30','10','20','30','10']})

Any help would be appreciated.

Upvotes: 2

Views: 2058

Answers (2)

Emi OB
Emi OB

Reputation: 3299

It sounds like Pandas groupby is what you need. This will set the index to the two columns, so you'll need to reset the index after grouping the data. You will also need to change the data type of your 'C' column to an integer, as it is currently strings (text) and will not sum.

df1 = pd.DataFrame({'A': ['Spain', 'France','Spain','France','France','France','Spain',],
                    'B': ['Meat', 'Meat','Meat','Drink','Meat','Drink','Drink'],
                    'C': ['10','20','30','10','20','30','10']})
df1['C'] = df1['C'].astype(int)
df1.groupby(['A', 'B']).sum().reset_index()

this will return the below as desired:

        A      B   C
0  France  Drink  10
1  France   Meat  40
2   Spain   Meat  40

Upvotes: 0

Andrej Kesely
Andrej Kesely

Reputation: 195613

Try:

df1["C"] = df1["C"].astype(int)
print(df1.groupby(["A", "B"], as_index=False).sum())

Prints:

        A      B   C
0  France  Drink  40
1  France   Meat  40
2   Spain  Drink  10
3   Spain   Meat  40

df1 used:

        A      B   C
0   Spain   Meat  10
1  France   Meat  20
2   Spain   Meat  30
3  France  Drink  10
4  France   Meat  20
5  France  Drink  30
6   Spain  Drink  10

Upvotes: 1

Related Questions