Reputation: 617
I'm trying to achieve with pandas something that seems to be simple, but I'm stuck after several unglorious tests.
Here's the thing. I've got one Dataframe (let's call it streets) with only two series : streets by name and a gender related to them :
name gender
0 Abraham Lincoln Avenue undefined
1 Donald Trump Dead End undefined
2 Hillary Clinton Street undefined
...
1754 Ziggy Marley Boulevard undefined
On the other hand, I've got an other Dataframe (let's call it fnames), very very huge. It has four series :
gender gender_detail main_gender first_name
0 F Female Female Aaf
1 F Female Female Aafke
2 F Female Female Aafkea
3 M Male Male Aafko
...
40211 F Female Female Zyta
So like you've certainly guessed, I would to use 'first_name' serie of fnames to check if one of the first names appears or not in 'name' serie of streets.
If the first name is found, I update 'gender' serie in streets with related value of fnames' serie called 'gender'. If not, I let 'undefined'.
Obviously, I can't use two for loops because of Dataframes' size... Is there any quick solution to achieve that ?
For example, should I create a dictionnary with only first name as key and gender as value to be more efficient ?
PS : I don't know if it can simplify the issue but my two Dataframes are sorted by alphabetical order !
Upvotes: 2
Views: 86
Reputation: 862851
Yes, I think you can use dict
with map
of splitted column name
by split
by whitespace
and selected first value by str[0]
, last replace NaN
by fillna
:
print (df1)
name gender
0 Abraham Lincoln Avenue undefined
1 Donald Trump Dead End undefined
2 Hillary Clinton Street undefined
3 Aaf Street undefined
1754 Ziggy Marley Boulevard undefined
print (df2)
gender gender_detail main_gender first_name
0 F Female Female Aaf
1 F Female Female Aafke
2 F Female Female Aafkea
3 F Female Female Aafko
40211 F Female Female Zyta
d = df2.set_index('first_name')['gender'].to_dict()
print (d)
{'Zyta': 'F', 'Aaf': 'F', 'Aafkea': 'F', 'Aafke': 'F', 'Aafko': 'F'}
print (df1['name'].str.split().str[0])
0 Abraham
1 Donald
2 Hillary
3 Aaf
1754 Ziggy
Name: name, dtype: object
df1['gender'] = df1['name'].str.split().str[0].map(d).fillna('undefined')
print (df1)
name gender
0 Abraham Lincoln Avenue undefined
1 Donald Trump Dead End undefined
2 Hillary Clinton Street undefined
3 Aaf Street F
1754 Ziggy Marley Boulevard undefined
Upvotes: 2