Reputation: 89
I would like to replace the fruit strings in df1['fruit'] with the corresponding ID from df2['id']. Essentially I'm trying to create a lookup table for fruit.
df1
fruit | store | price |
---|---|---|
apple | heb | 0.99 |
orange | heb | 0.69 |
apple | kroger | 0.88 |
apple | walmart | 0.89 |
banana | heb | 0.59 |
banana | walmart | 0.59 |
orange | kroger | 0.39 |
df2
id | fruit |
---|---|
0 | apple |
1 | orange |
2 | banana |
I'd like to modify df1 to look like this (preferably in-place):
df1
fruit | store | price |
---|---|---|
0 | heb | 0.99 |
1 | heb | 0.69 |
0 | kroger | 0.88 |
0 | walmart | 0.89 |
2 | heb | 0.59 |
2 | walmart | 0.59 |
1 | kroger | 0.39 |
This seems like a simple enough thing to do, but I'm pretty new to Pandas so I could use some help. All of the googling and stackoverflow-ing I've done hasn't quite answered the question yet. Also, eventually I'd also like to create a lookup dataframe to do the same kind of replacement of store name strings in df1['store']
Here are some things I've tried so far:
.loc
df1.loc[df1['fruit'] == df2['fruit'], 'fruit'] = df2['id']
This results in the following error: ValueError: Can only compare identically-labeled Series objects
.where
df1.where(df1['fruit'] == df2['fruit'], other= df2['id'], inplace=True)
I know I have this all wrong, since where
should be replacing values based on a false condition. Also, it doesn't work, so there's another clue I'm on the wrong path :)
.replace
I don't have this code anymore, but I tried converting df2 into a dict and replacing values in df1 conditionally.
I know I'm close! This is something I need to do a lot...replace values in one dataframe based on values from another dataframe. Any help would be greatly appreciated!
Thanks in advance :)
Upvotes: 0
Views: 1806
Reputation: 719
This might not be the most pythonic way of doing this, and there is surely pandas built-in function to do this (I encourage you check the documentation) but here is a solution that works.
You can achieve that by creating a mapping function and applying it to your column.
Mapping function:
fd2
here's the function:
def mapping(x):
mapping_dict = df2.set_index("fruit")['id']
try:
return mapping_dict[x]
except:
return x #or whatever to indicate that the fruit is not in df2
Apply it to df1:
df1['fruit'] = df1['fruit'].apply(mapping)
Upvotes: 0