Reputation: 269
I want to match/map the missing value in a dataframe based on another column. For example,
City State Country
Chicago IL United States
Boston MA United States
San Diego
Los Angeles CA United States
San Francisco
Sacramento
Vancouver BC Canada
So, if I want to fill in the empty cells of the provinces and countries of those three cities same as Los Angeles. What should I do?
Below is my code but I'm completely stuck in it.
CA_cities = ['San Diego', 'Los Angeles', 'San Francisco', 'Sacramento']
df.loc[df['City'] == CA_cities, 'State' = 'CA' and 'Country' = 'United States']
Any help will be greatly appreciated.
Upvotes: 1
Views: 57
Reputation: 863226
You can use groupby
with mask created by isin
, then replace NaN
s by back and forward filling:
CA_cities = ['San Diego', 'Los Angeles', 'San Francisco', 'Sacramento']
df = df.groupby(df['City'].isin(CA_cities)).apply(lambda x: x.ffill().bfill())
print (df)
City State Country
0 Chicago IL United States
1 Boston MA United States
2 San Diego CA United States
3 Los Angeles CA United States
4 San Francisco CA United States
5 Sacramento CA United States
6 Vancouver BC Canada
More general solution is create groups of cities, e.g. in dictionaries, swap keys
wih values and map
column:
print (df)
City State Country
0 Chicago IL United States
1 Chicago1 NaN NaN
2 Boston MA United States
3 San Diego NaN NaN
4 Los Angeles CA United States
5 San Francisco NaN NaN
6 Sacramento NaN NaN
7 Vancouver BC Canada
cities = {'CA': ['San Diego', 'Los Angeles', 'San Francisco', 'Sacramento'],
'IL':['Chicago','Chicago1']}
d = {k: oldk for oldk, oldv in cities.items() for k in oldv}
df = df.groupby(df['City'].map(d).fillna(df['City'])).apply(lambda x: x.ffill().bfill())
#slowier alternative
#df = df.groupby(df['City'].replace(d)).apply(lambda x: x.ffill().bfill())
print (df)
City State Country
0 Chicago IL United States
1 Chicago1 IL United States
2 Boston MA United States
3 San Diego CA United States
4 Los Angeles CA United States
5 San Francisco CA United States
6 Sacramento CA United States
7 Vancouver BC Canada
Detail:
print (df['City'].map(d).fillna(df['City']))
0 IL
1 IL
2 Boston
3 CA
4 CA
5 CA
6 CA
7 Vancouver
Name: City, dtype: object
print (d)
{'San Diego': 'CA', 'Los Angeles': 'CA', 'San Francisco': 'CA',
'Sacramento': 'CA', 'Chicago': 'IL', 'Chicago1': 'IL'}
Upvotes: 3
Reputation: 323326
Or just split it , and using fillna
.
CA_cities = ['SanDiego', 'LosAngeles', 'SanFrancisco', 'Sacramento']
s=df.loc[df.City.isin(CA_cities),:]
t=df.loc[~df.City.isin(CA_cities),:]
pd.concat([s.fillna({'State':'CA','Country':'UnitedStates'}),t])
Out[1023]:
City State Country
2 SanDiego CA UnitedStates
3 LosAngeles CA UnitedStates
4 SanFrancisco CA UnitedStates
5 Sacramento CA UnitedStates
0 Chicago IL UnitedStates
1 Boston MA UnitedStates
6 Vancouver BC Canada
Upvotes: 3