Square9627
Square9627

Reputation: 939

Lookup values with different column names in pandas

I have 2 pandas dataframes df1 and df2

Name   No
A      1
A      2
B      5

Player Gender
A      F
B      M
C      F

I would like to create a new column sex in the df1 dataframe, using corresponding values from the column gender in df2. The columns used to look up are Name in df1 and Player in df2.

Really appreciate any help

Upvotes: 2

Views: 5221

Answers (1)

jezrael
jezrael

Reputation: 862541

Use map by df2 where is set_index from column Player:

df1['sex'] = df1.Name.map(df2.set_index('Player')['Gender'])
print (df1)
  Name  No sex
0    A   1   F
1    A   2   F
2    B   5   M

This is same as map by dict:

d = df2.set_index('Player')['Gender'].to_dict()
print (d)
{'A': 'F', 'B': 'M', 'C': 'F'}
df1['sex'] = df1.Name.map(d)
print (df1)
  Name  No sex
0    A   1   F
1    A   2   F
2    B   5   M

Or:

print (pd.merge(df1,df2, left_on='Name', right_on='Player')
         .rename(columns={'Gender':'sex'})
         .drop('Player', axis=1))

  Name  No sex
0    A   1   F
1    A   2   F
2    B   5   M

First is more faster:

In [46]: %timeit (pd.merge(df1,df2, left_on='Name', right_on='Player').rename(columns={'Gender':'sex'}).drop('Player', axis=1))
The slowest run took 4.53 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 2.53 ms per loop

In [47]: %timeit df1.Name.map(df2.set_index('Player')['Gender'])
The slowest run took 4.78 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 882 µs per loop

Upvotes: 3

Related Questions