Shubham R
Shubham R

Reputation: 7644

Complex Map operation on two dataframes in pandas with condition

i have a table in pandas df.

id    prod1    prod2    count
1      10       30        100
2      10       20        200
3      20       10        200
4      30       10        100
5      30       40        300

also i am having another table in df2

product  price   master_product
  1000       1          10
  5000       2          10
  2000       2          20
  9000       5          20
  8000       1          20
  30         3          0
  4000       4          50

Check if prod1 and prod2 belongs to the values in master_product,

if yes i want to replace prod1 and prod2 in my first df with the cheapest product in my master_product.

if the prod1 and prod2 values donot match with the values in master_product, leave the values as it is.

i am looking for the final table as.

id    prod1    prod2    count
1      1000    4000        100
2      1000    8000        200
3      8000    1000        200
4      30       1000        100   #since 30 is not in master_product,leave as it is
5      30       40        300

i was trying to use .map function to achieve this but i could only reach to this.

df['prod1'] = df['prod1'].map(df2.set_index('master_product')['product'])
df['prod2'] = df['prod2'].map(df2.set_index('master_product')['product'])

but it will try to replace every values in prod1 and prod2 with matching values in master_product from df2.

Any ideas how to achieve this?

Upvotes: 2

Views: 330

Answers (1)

jezrael
jezrael

Reputation: 863301

You can first modify df1 for minimal price by master_product by groupby with idxmin - get all indices with minimal price:

df1 = df1.loc[df1.groupby('master_product')['price'].idxmin()]
print (df1)
   product  price  master_product
5       30      3               0
0     1000      1              10
4     8000      1              20
6     4000      4              50

Create dict for mapping:

d = df1.set_index('master_product')['product'].to_dict()
print (d)
{0: 30, 10: 1000, 20: 8000, 50: 4000}

Last map and if value is missing add it by combine_first:

df.prod1 = df.prod1.map(d).combine_first(df.prod1)
df.prod2 = df.prod2.map(d).combine_first(df.prod2)
print (df)
   id   prod1   prod2  count
0   1  1000.0    30.0    100
1   2  1000.0  8000.0    200
2   3  8000.0  1000.0    200
3   4    30.0  1000.0    100
4   5    30.0    40.0    300

Upvotes: 2

Related Questions