Timo Vossen
Timo Vossen

Reputation: 313

Fill NaN with corresponding row value in Python

I have the following dataframe:

      Region                   Date         Confirmed   Deaths  Recovered   Latitude    Longitude
0     Mainland China Anhui     2020-01-22   1.0         0.0     0.0         NaN         NaN
1     Mainland China Beijing   2020-01-22   14.0        0.0     0.0         NaN         NaN
2     Mainland China Chongqing 2020-01-22   6.0         0.0     0.0         NaN         NaN
3     Mainland China Fujian    2020-01-22   1.0         0.0     0.0         NaN         NaN
4     Mainland China Gansu     2020-01-22   0.0         0.0     0.0         NaN         NaN
2825  Mainland China Anhui     2020-03-01   990.0       6.0     873.0       31.8257     117.2264
567   Mainland China Anhui     2020-02-05   1.0         0.0     0.0         NaN         NaN
2951  Mainland China Anhui     2020-03-02   990.0       6.0     917.0       31.8257     117.2264
4273  Mainland China Fujian    2020-03-07   296.0       1.0     295.0       26.0789     117.9874
4541  Mainland China Fujian    2020-03-07   296.0       1.0     295.0       26.0789     117.9874

I want to fill the NaN values in the Latitude and Longtitude with the corresponding value based on the region.

I tried:

df = df.groupby(['Region']).ffill()
df

But that only got me this:

        Date        Confirmed   Deaths  Recovered   Latitude    Longitude
0       2020-01-22  1.0         0.0     0.0         NaN         NaN
1       2020-01-22  14.0        0.0     0.0         NaN         NaN
2       2020-01-22  6.0         0.0     0.0         NaN         NaN
3       2020-01-22  1.0         0.0     0.0         NaN         NaN
4       2020-01-22  0.0         0.0     0.0         NaN         NaN

Thanks in advance!

Upvotes: 1

Views: 42

Answers (2)

Zaraki Kenpachi
Zaraki Kenpachi

Reputation: 5730

You can use back and forward fill method on grouped elements.

df['Latitude'] = df.groupby('Region')['Latitude'].fillna(method='backfill').fillna(method='pad')
df['Longitude'] = df.groupby('Region')['Longitude'].fillna(method='backfill').fillna(method='pad')

Upvotes: 1

Serge Ballesta
Serge Ballesta

Reputation: 148910

I would just use the fact that max ignores NaN values, so this should be enough:

df.loc[:,['Latitude', 'Longitude']] = df.groupby('Region')[['Latitude', 'Longitude']].transform('max')

Upvotes: 2

Related Questions