Reputation: 313
I have the following dataframe:
Region Date Confirmed Deaths Recovered Latitude Longitude
0 Mainland China Anhui 2020-01-22 1.0 0.0 0.0 NaN NaN
1 Mainland China Beijing 2020-01-22 14.0 0.0 0.0 NaN NaN
2 Mainland China Chongqing 2020-01-22 6.0 0.0 0.0 NaN NaN
3 Mainland China Fujian 2020-01-22 1.0 0.0 0.0 NaN NaN
4 Mainland China Gansu 2020-01-22 0.0 0.0 0.0 NaN NaN
2825 Mainland China Anhui 2020-03-01 990.0 6.0 873.0 31.8257 117.2264
567 Mainland China Anhui 2020-02-05 1.0 0.0 0.0 NaN NaN
2951 Mainland China Anhui 2020-03-02 990.0 6.0 917.0 31.8257 117.2264
4273 Mainland China Fujian 2020-03-07 296.0 1.0 295.0 26.0789 117.9874
4541 Mainland China Fujian 2020-03-07 296.0 1.0 295.0 26.0789 117.9874
I want to fill the NaN values in the Latitude and Longtitude with the corresponding value based on the region.
I tried:
df = df.groupby(['Region']).ffill()
df
But that only got me this:
Date Confirmed Deaths Recovered Latitude Longitude
0 2020-01-22 1.0 0.0 0.0 NaN NaN
1 2020-01-22 14.0 0.0 0.0 NaN NaN
2 2020-01-22 6.0 0.0 0.0 NaN NaN
3 2020-01-22 1.0 0.0 0.0 NaN NaN
4 2020-01-22 0.0 0.0 0.0 NaN NaN
Thanks in advance!
Upvotes: 1
Views: 42
Reputation: 5730
You can use back and forward fill method on grouped elements.
df['Latitude'] = df.groupby('Region')['Latitude'].fillna(method='backfill').fillna(method='pad')
df['Longitude'] = df.groupby('Region')['Longitude'].fillna(method='backfill').fillna(method='pad')
Upvotes: 1
Reputation: 148910
I would just use the fact that max
ignores NaN values, so this should be enough:
df.loc[:,['Latitude', 'Longitude']] = df.groupby('Region')[['Latitude', 'Longitude']].transform('max')
Upvotes: 2