Reputation: 25
I have two dataframes, one is a continuous data(taken every 15 seconds) from a buoy in a fixed location and the other is observation data taken at many different time intervals in different places. Both have consistent timestamps in their respective dataframes. For every observation I need to take the timestamp of the observation and find the closest timestamp in the continuous dataframe, extract information from that row, and add it to the observation. I am having a hard time finding a way to find the closest timestamp in the continuous data with my series.
Observation data:
counter depth latdeg latmin latdec londeg lonmin ts
0 100001 21.110 72 18.5412 72.309020 -148 -47.071 2018-03-20 17:21:49+01:00
1 100002 22.140 72 18.5448 72.309080 -148 -47.0785 2018-03-20 17:22:07+01:00
2 100003 45.300 72 18.5396 72.308993 -148 -47.0936 2018-03-20 17:34:38+01:00
3 100004 45.310 72 18.5360 72.308933 -148 -47.0974 2018-03-20 17:36:31+01:00
Continuous data:
sec sat lat long alt time
4164 62460 9 72.31061472 -148.790606 -6.9809 2018-03-20 17:21:00+01:00
4165 62475 9 72.31061655 -148.7906147 -7.0923 2018-03-20 17:21:15+01:00
4166 62490 9 72.31061099 -148.7906154 -7.7008 2018-03-20 17:21:30+01:00
4167 62505 9 72.31060295 -148.7906237 -8.3746 2018-03-20 17:21:45+01:00
4168 62520 9 72.31059877 -148.7906251 -7.5989 2018-03-20 17:22:00+01:00
so for example I would like to take obs[0]['ts']
and find the index of the closest time in cont['time']
, which would be 4167
, to then append the lat long and alt to the observation dataframe.
Upvotes: 2
Views: 2029
Reputation: 59519
You are looking for pandas.merge_asof
It allows you to join two DataFrames on keys that are not exact. In this case, you want to use it with direciton = nearest
to match based on the two closest timestamps.
import pandas as pd
pd.merge_asof(df_obs, df_cont[['lat', 'long', 'alt', 'time']],
left_on='ts', right_on='time', direction='nearest')
Outputs:
counter depth latdeg latmin latdec londeg lonmin ts lat long alt time
0 100001 21.11 72 18.5412 72.309020 -148 -47.0710 2018-03-20 16:21:49 72.310603 -148.790624 -8.3746 2018-03-20 16:21:45
1 100002 22.14 72 18.5448 72.309080 -148 -47.0785 2018-03-20 16:22:07 72.310599 -148.790625 -7.5989 2018-03-20 16:22:00
2 100003 45.30 72 18.5396 72.308993 -148 -47.0936 2018-03-20 16:34:38 72.310599 -148.790625 -7.5989 2018-03-20 16:22:00
3 100004 45.31 72 18.5360 72.308933 -148 -47.0974 2018-03-20 16:36:31 72.310599 -148.790625 -7.5989 2018-03-20 16:22:00
You can then drop the time
column if you don't want it, I just left it in to make it clear how the merge worked.
Upvotes: 1