Reputation: 119
I'm currently wrangling a big data set of 2 mio rows from Lyft for a Udacity project. The DataFrame looks like this:
id name latitude longitude
0 148.0 Horton St at 40th St 37.829705 -122.287610
1 376.0 Illinois St at 20th St 37.760458 -122.387540
2 453.0 Brannan St at 4th St 37.777934 -122.396973
3 182.0 19th Street BART Station 37.809369 -122.267951
4 237.0 Fruitvale BART Station 37.775232 -122.224498
5 NaN NaN 37.775232 -122.224498
As I try to express in the last line, I have a lot of NaN values for id
and name
, however, latitude
and longitude
are mostly never empty. My assumption is that I could actually extract the name
from other rows given a certain combination of latitude
and longitude
.
Once I have the name, I would try filling the NaN values for id
using name
dict_id = dict(zip(df['name'], df['id']))
df['id'] = df['id'].fillna(df['name'].map(dict_id))
However, I struggle because with latitude
and longitude
I have two values to match against the name.
Upvotes: 2
Views: 122
Reputation: 75080
You can left merge the dataframe with the copy of it after dropna , then rename the columns:
m = df.merge(df.dropna(subset=['name']),on=['latitude','longitude'],
how='left',suffixes=('','_y'))
out = (m.drop(['id','name'],1).rename(columns={'id_y':'id','name_y':'name'})
.reindex(df.columns,axis=1))
id name latitude longitude
0 148.0 Horton St at 40th St 37.829705 -122.287610
1 376.0 Illinois St at 20th St 37.760458 -122.387540
2 453.0 Brannan St at 4th St 37.777934 -122.396973
3 182.0 19th Street BART Station 37.809369 -122.267951
4 237.0 Fruitvale BART Station 37.775232 -122.224498
5 237.0 Fruitvale BART Station 37.775232 -122.224498
Upvotes: 4