Update a column in a data frame with the index of another column in another data frame in pandas python

Question

I have two data frames in pandas python:

df1:

     Fruit   Origin  
0    Apple   Spain
1    Apple   France
2    Apple   Italy
3    Banana  Germany
4    Banana  Portugal
5    Grapes  France
6    Grapes  Spain

df2:

     Fruit
0    Apple
1    Banana
2    Grapes

and I want to modify the column Fruit in the df1 by the index of each fruit in df2, the result I am looking for should look like:

df1:

     Fruit   Origin  
0    0       Spain
1    0       France
2    0       Italy
3    1       Germany
4    1       Portugal
5    2       France
6    2       Spain

What I tried that works is:

df1['Fruit'] = df1.Fruit.apply(lambda x: df2.index[df2.Fruit == x])

However I am working with a big data set so it takes too much time, I am looking for a faster option to do this.

jme · Accepted Answer

I'd suggest using join. First we'll want to set the index of df2 to be the Fruits column:

df2 = df2.reset_index().set_index('Fruit')

So that

        index
Fruit        
Apple       0
Banana      1
Grapes      2

Now we just write:

>>> df1.join(df2, on='Fruit')

    Fruit    Origin  index
0   Apple     Spain      0
1   Apple    France      0
2   Apple     Italy      0
3  Banana   Germany      1
4  Banana  Portugal      1
5  Grapes    France      2
6  Grapes     Spain      2

Update a column in a data frame with the index of another column in another data frame in pandas python

Answers (1)

Related Questions