octopusbones
octopusbones

Reputation: 89

Replace matching values from one dataframe with index value from another dataframe

I would like to replace the fruit strings in df1['fruit'] with the corresponding ID from df2['id']. Essentially I'm trying to create a lookup table for fruit.

df1

fruit store price
apple heb 0.99
orange heb 0.69
apple kroger 0.88
apple walmart 0.89
banana heb 0.59
banana walmart 0.59
orange kroger 0.39

df2

id fruit
0 apple
1 orange
2 banana

I'd like to modify df1 to look like this (preferably in-place):

df1

fruit store price
0 heb 0.99
1 heb 0.69
0 kroger 0.88
0 walmart 0.89
2 heb 0.59
2 walmart 0.59
1 kroger 0.39

This seems like a simple enough thing to do, but I'm pretty new to Pandas so I could use some help. All of the googling and stackoverflow-ing I've done hasn't quite answered the question yet. Also, eventually I'd also like to create a lookup dataframe to do the same kind of replacement of store name strings in df1['store']

Here are some things I've tried so far:

.loc df1.loc[df1['fruit'] == df2['fruit'], 'fruit'] = df2['id']

This results in the following error: ValueError: Can only compare identically-labeled Series objects

.where

df1.where(df1['fruit'] == df2['fruit'], other= df2['id'], inplace=True)

I know I have this all wrong, since where should be replacing values based on a false condition. Also, it doesn't work, so there's another clue I'm on the wrong path :)

.replace

I don't have this code anymore, but I tried converting df2 into a dict and replacing values in df1 conditionally.

I know I'm close! This is something I need to do a lot...replace values in one dataframe based on values from another dataframe. Any help would be greatly appreciated!

Thanks in advance :)

Upvotes: 0

Views: 1806

Answers (2)

Nk03
Nk03

Reputation: 14949

TRY:

df1['fruit'] = df1.fruit.map(dict(df2[['fruit','id']].values))

Upvotes: 1

zanga
zanga

Reputation: 719

This might not be the most pythonic way of doing this, and there is surely pandas built-in function to do this (I encourage you check the documentation) but here is a solution that works.

You can achieve that by creating a mapping function and applying it to your column.

Mapping function:

  • Creates a dict of correspondance between fruits and their codes from fd2
  • returns the associated value if existing or the original value if not.

here's the function:

def mapping(x):
    mapping_dict = df2.set_index("fruit")['id']
    try:
        return mapping_dict[x]
    except:
        return x #or whatever to indicate that the fruit is not in df2

Apply it to df1:

df1['fruit'] = df1['fruit'].apply(mapping)

Upvotes: 0

Related Questions