Dataframes in Python - matching multiple columns of rows between two data frames

Question

I have two data frames df1 - which holds a 'grouped inventory' of items grouped by numerical values A, B and C. For each item there is a sum column which should reflect the total price of all the items I have of that particular type. Initially I have set the sum column to zero.

df2 is a list of items I have with A, B, C and the price of the item.

df1 (Initial Inventory):

A     B     C   SUM
1     1     1    0
1     1     2    0
1     2     2    0
2     2     2    0

df2 (List of items):

A     B     C   PRICE
2     2     2    30
1     1     2    100
1     1     2    110
1     1     2    105

So my code should convert df1 into:

df1 (expected output):

A     B     C   SUM
1     1     1    0
1     1     2    315
1     2     2    0
2     2     2    30

Explanation: My list of items (df2) contains one item coded as 2,2,2 which has a value of 30 and contains three items coded as 1,1,2 which has values of 100 + 110 + 105 = 315. So I update the inventory table df1, to reflect that I have a total value of 30 for items coded 2,2,2 and total value of 315 for items coded 1,1,2. I have 0 in value for items coded 1,1,1 and 1,2,2 - since they aren't found in my items list.

What would be the most efficient way to do this?

I would rather not use loops since df1 is 720 rows and df2 is 10,000 rows.

Tai · Accepted Answer

You can try to merge on columns "A", "B", and "C" with how="left". (df2_sum below is a subset of df1, so we choose left here.)

df2_sum = df2.groupby(["A", "B", "C"])["PRICE"].sum().reset_index()

df1.merge(df2_sum, on=["A","B","C"], how="left").fillna(0)
    A   B   C   SUM PRICE
0   1   1   1   0   0.0
1   1   1   2   0   315.0
2   1   2   2   0   0.0
3   2   2   2   0   30.0

You can then add PRICE to your SUM column.

Dataframes in Python - matching multiple columns of rows between two data frames

Answers (1)

Related Questions