Shubham R
Shubham R

Reputation: 7644

Removing Duplicate columns while doing pandas merge

i have a table in pandas df

id   product_1   product_2   count  
1        100       200          10
2        200       600          20
3        100       500          30
4        400       100          40
5        500       700          50
6        200       500          60
7        100       400          70

also i have another table in dataframe df2

product    price
100         5
200         10
300         15
400         20
500         25
600         30
700         35

i wanted to merge df2 with df1 such that i get price_x and price_y as columns

and then again divide price_y/price_x to get final column as perc_diff.

so i tried to do merge using.

# Add prices for products 1 and 2
df3 = (df1.
       merge(df2, left_on='product_1', right_on='product').
       merge(df2, left_on='product_2', right_on='product'))

# Calculate the percent difference
df3['perc_diff'] = (df3.price_y - df3.price_x) / df3.price_x

But when i did merge i got multiple columns of product_1 and product_2

for eg. my df3.head(1) after merging is:

id  product_1   product_2   count  product_1   product_2   price_x   price_y 
1       100       200         10    100          200          5        10       

So how do i remove these multiple column's of product_1 & product_2 while merging or after merging?

Upvotes: 2

Views: 4961

Answers (2)

jezrael
jezrael

Reputation: 863611

For removing column is necessary rename:

df3 = df1.merge(df2, left_on='product_1', right_on='product') \
         .merge(df2.rename(columns={'product':'product_2'}), on='product_2')

#borrow from piRSquared solution
df3 = df3.assign(perc_diff=df3.price_y.div(df3.price_x).sub(1))
print (df3)
   id  product_1  product_2  count  product  price_x  price_y  perc_diff
0   1        100        200     10      100        5       10       1.00
1   3        100        500     30      100        5       25       4.00
2   6        200        500     60      200       10       25       1.50
3   7        100        400     70      100        5       20       3.00
4   2        200        600     20      200       10       30       2.00
5   4        400        100     40      400       20        5      -0.75
6   5        500        700     50      500       25       35       0.40

Upvotes: 2

piRSquared
piRSquared

Reputation: 294516

df2_ = df2.set_index('product')
df3 = df.join(df2_, on='product_1') \
        .join(df2_, on='product_2', lsuffix='_x', rsuffix='_y')

df3.assign(perc_diff=df3.price_y.div(df3.price_x).sub(1))

enter image description here

Upvotes: 4

Related Questions