Reputation: 23
I have two data frames in pandas python:
df1:
Fruit Origin
0 Apple Spain
1 Apple France
2 Apple Italy
3 Banana Germany
4 Banana Portugal
5 Grapes France
6 Grapes Spain
df2:
Fruit
0 Apple
1 Banana
2 Grapes
and I want to modify the column Fruit in the df1 by the index of each fruit in df2, the result I am looking for should look like:
df1:
Fruit Origin
0 0 Spain
1 0 France
2 0 Italy
3 1 Germany
4 1 Portugal
5 2 France
6 2 Spain
What I tried that works is:
df1['Fruit'] = df1.Fruit.apply(lambda x: df2.index[df2.Fruit == x])
However I am working with a big data set so it takes too much time, I am looking for a faster option to do this.
Upvotes: 1
Views: 53
Reputation: 20695
I'd suggest using join
. First we'll want to set the index of df2
to be the Fruits
column:
df2 = df2.reset_index().set_index('Fruit')
So that
index
Fruit
Apple 0
Banana 1
Grapes 2
Now we just write:
>>> df1.join(df2, on='Fruit')
Fruit Origin index
0 Apple Spain 0
1 Apple France 0
2 Apple Italy 0
3 Banana Germany 1
4 Banana Portugal 1
5 Grapes France 2
6 Grapes Spain 2
Upvotes: 1