user3206440
user3206440

Reputation: 5049

Pandas - fill missing lat long cordinates by interpolation

With dataframe like below

Time    Lat    Long
19:24:52.135    35.61067    139.630228
19:24:52.183    NaN NaN
19:24:52.281    NaN NaN
19:24:52.378    NaN NaN
19:24:52.466    35.610692   139.630428

Need to fill in the NaN values for Lat and Long fields such that each row with NaN values for Lat / Long takes value such that:

  1. they fall on a straight line between the next (say x2,y2) and
  2. the previous non NaN lat/long (say x1,y1) points and are spaced equally between them.

In the above case, since there are three rows with NaN for Lat/Long, they need to take 3 equally spaced points between the non-NaN rows

Is there a way to achieve this with pandas or should it be done outside?

Update:

Tried df.interpolate() as suggested in comments - that works!!

Upvotes: 3

Views: 1677

Answers (2)

user3206440
user3206440

Reputation: 5049

Tried df.interpolate() as suggested in comments - that works!!

(Pdb) df["Long"].interpolate(method='linear')
0    139.630228
1    139.630278
2    139.630328
3    139.630378
4    139.630428
Name: Long, dtype: float64
(Pdb) df["Long"].interpolate()
0    139.630228
1    139.630278
2    139.630328
3    139.630378
4    139.630428
Name: Long, dtype: float64

Upvotes: 2

theletz
theletz

Reputation: 1805

You can try this (this is the solution for Lat and the same can be done for Long):

df = pd.DataFrame({'Lat':[35.61069, np.nan, np.nan, np.nan, 35.610692], 'Long': [139.630428, np.nan, np.nan, np.nan, 139.630428]})

df
         Lat        Long
0  35.610690  139.630428
1        NaN         NaN
2        NaN         NaN
3        NaN         NaN
4  35.610692  139.630428

Let`s create a new column with the last not Nan Lat value

df['Lat_shift'] = df['Lat'].shift()
df['Lat_shift'] = df['Lat_shift'].fillna(method='ffill')

df
         Lat        Long  Lat_shift
0  35.610690  139.630428        NaN
1        NaN         NaN   35.61069
2        NaN         NaN   35.61069
3        NaN         NaN   35.61069
4  35.610692  139.630428   35.61069

Now we can calculate whatever metric we want:

df['Lat_new'] = df['Lat_shift'] + (df['Lat'] - df['Lat_shift'])/2

         Lat        Long  Lat_shift    Lat_new
0  35.610690  139.630428        NaN        NaN
1        NaN         NaN   35.61069        NaN
2        NaN         NaN   35.61069        NaN
3        NaN         NaN   35.61069        NaN
4  35.610692  139.630428   35.61069  35.610691

and use it to fill the nan values:

df.loc[pd.isnull(df['Lat']), 'Lat'] = df['Lat_new'].fillna(method='bfill')

df.drop(columns=['Lat_shift', 'Lat_new'])

         Lat        Long
0  35.610690  139.630428
1  35.610691         NaN
2  35.610691         NaN
3  35.610691         NaN
4  35.610692  139.630428

I hope it helps :)

Upvotes: 0

Related Questions