Reputation: 939
I have 2 pandas dataframes df1
and df2
Name No
A 1
A 2
B 5
Player Gender
A F
B M
C F
I would like to create a new column sex
in the df1
dataframe, using corresponding values from the column gender
in df2
. The columns used to look up are Name
in df1
and Player
in df2
.
Really appreciate any help
Upvotes: 2
Views: 5221
Reputation: 862541
Use map
by df2
where is set_index
from column Player
:
df1['sex'] = df1.Name.map(df2.set_index('Player')['Gender'])
print (df1)
Name No sex
0 A 1 F
1 A 2 F
2 B 5 M
This is same as map
by dict
:
d = df2.set_index('Player')['Gender'].to_dict()
print (d)
{'A': 'F', 'B': 'M', 'C': 'F'}
df1['sex'] = df1.Name.map(d)
print (df1)
Name No sex
0 A 1 F
1 A 2 F
2 B 5 M
Or:
print (pd.merge(df1,df2, left_on='Name', right_on='Player')
.rename(columns={'Gender':'sex'})
.drop('Player', axis=1))
Name No sex
0 A 1 F
1 A 2 F
2 B 5 M
First is more faster:
In [46]: %timeit (pd.merge(df1,df2, left_on='Name', right_on='Player').rename(columns={'Gender':'sex'}).drop('Player', axis=1))
The slowest run took 4.53 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 2.53 ms per loop
In [47]: %timeit df1.Name.map(df2.set_index('Player')['Gender'])
The slowest run took 4.78 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 882 µs per loop
Upvotes: 3