Patricia Nelsen
Patricia Nelsen

Reputation: 25

Extract information from closest timestamp in separate dataframe pandas

I have two dataframes, one is a continuous data(taken every 15 seconds) from a buoy in a fixed location and the other is observation data taken at many different time intervals in different places. Both have consistent timestamps in their respective dataframes. For every observation I need to take the timestamp of the observation and find the closest timestamp in the continuous dataframe, extract information from that row, and add it to the observation. I am having a hard time finding a way to find the closest timestamp in the continuous data with my series.

Observation data:

   counter    depth  latdeg   latmin     latdec  londeg  lonmin    ts
0      100001   21.110      72  18.5412  72.309020    -148 -47.071 2018-03-20 17:21:49+01:00
1      100002   22.140      72  18.5448  72.309080    -148 -47.0785 2018-03-20 17:22:07+01:00
2      100003   45.300      72  18.5396  72.308993    -148 -47.0936 2018-03-20 17:34:38+01:00
3      100004   45.310      72  18.5360  72.308933    -148 -47.0974 2018-03-20 17:36:31+01:00

Continuous data:

    sec sat lat long    alt time
4164    62460   9   72.31061472 -148.790606 -6.9809 2018-03-20 17:21:00+01:00
4165    62475   9   72.31061655 -148.7906147    -7.0923 2018-03-20 17:21:15+01:00
4166    62490   9   72.31061099 -148.7906154    -7.7008 2018-03-20 17:21:30+01:00
4167    62505   9   72.31060295 -148.7906237    -8.3746 2018-03-20 17:21:45+01:00
4168    62520   9   72.31059877 -148.7906251    -7.5989 2018-03-20 17:22:00+01:00

so for example I would like to take obs[0]['ts'] and find the index of the closest time in cont['time'], which would be 4167, to then append the lat long and alt to the observation dataframe.

Upvotes: 2

Views: 2029

Answers (1)

ALollz
ALollz

Reputation: 59519

You are looking for pandas.merge_asof

It allows you to join two DataFrames on keys that are not exact. In this case, you want to use it with direciton = nearest to match based on the two closest timestamps.

import pandas as pd

pd.merge_asof(df_obs, df_cont[['lat', 'long', 'alt', 'time']], 
              left_on='ts', right_on='time', direction='nearest')

Outputs:

   counter  depth  latdeg   latmin     latdec  londeg   lonmin                  ts        lat        long     alt                time
0   100001  21.11      72  18.5412  72.309020    -148 -47.0710 2018-03-20 16:21:49  72.310603 -148.790624 -8.3746 2018-03-20 16:21:45
1   100002  22.14      72  18.5448  72.309080    -148 -47.0785 2018-03-20 16:22:07  72.310599 -148.790625 -7.5989 2018-03-20 16:22:00
2   100003  45.30      72  18.5396  72.308993    -148 -47.0936 2018-03-20 16:34:38  72.310599 -148.790625 -7.5989 2018-03-20 16:22:00
3   100004  45.31      72  18.5360  72.308933    -148 -47.0974 2018-03-20 16:36:31  72.310599 -148.790625 -7.5989 2018-03-20 16:22:00

You can then drop the time column if you don't want it, I just left it in to make it clear how the merge worked.

Upvotes: 1

Related Questions