Removing Duplicate columns while doing pandas merge

Question

i have a table in pandas df

id   product_1   product_2   count  
1        100       200          10
2        200       600          20
3        100       500          30
4        400       100          40
5        500       700          50
6        200       500          60
7        100       400          70

also i have another table in dataframe df2

product    price
100         5
200         10
300         15
400         20
500         25
600         30
700         35

i wanted to merge df2 with df1 such that i get price_x and price_y as columns

and then again divide price_y/price_x to get final column as perc_diff.

so i tried to do merge using.

# Add prices for products 1 and 2
df3 = (df1.
       merge(df2, left_on='product_1', right_on='product').
       merge(df2, left_on='product_2', right_on='product'))

# Calculate the percent difference
df3['perc_diff'] = (df3.price_y - df3.price_x) / df3.price_x

But when i did merge i got multiple columns of product_1 and product_2

for eg. my df3.head(1) after merging is:

id  product_1   product_2   count  product_1   product_2   price_x   price_y 
1       100       200         10    100          200          5        10

So how do i remove these multiple column's of product_1 & product_2 while merging or after merging?

piRSquared · Accepted Answer

df2_ = df2.set_index('product')
df3 = df.join(df2_, on='product_1') \
        .join(df2_, on='product_2', lsuffix='_x', rsuffix='_y')

df3.assign(perc_diff=df3.price_y.div(df3.price_x).sub(1))

Removing Duplicate columns while doing pandas merge

Answers (2)

Related Questions