Matthias
Matthias

Reputation: 5764

Python, Pandas: join dataframes on timestamp and offset

Actually two questions, but let's try to explain.

I got two data sources (e.g. dataframes). One contains the GPS coordinates and the timestamp of a driven track. The other one contains specific points of interest, given by an offset in time - but without the GPS coordinate. The task is to find the GPS coordinates for the points of interests.

Here's what I got. The dataframe with the GPS coordinates is resampled to 1sec intervals and missing values are interpolated in a linear manner:

df_resampled.head()

With this output:

Time, LongitudeDegrees, LatitudeDegrees, DistanceMeters, AltitudeMeters

2017-08-24 13:33:46, 15.457636, 47.047332, 0.0, 348.600006
2017-08-24 13:33:47, 15.457668, 47.047288, 0.0, 348.600006
2017-08-24 13:33:48, 15.457699, 47.047244, 0.0, 348.600006
2017-08-24 13:33:49, 15.457973, 47.047136, 0.0, 348.600006

Remark: The types are all float while the index is Numpy.dtype (datetime). Conversion and Resampling was done by:

# convert numeric types first
df = df.apply(pd.to_numeric, errors='ignore')

# convert timestamp to datetime and set the index
df['Time'] = pd.to_datetime(df['Time'])
df.set_index('Time', inplace=True)

# resample to seconds and linear fill of missing values
df_downscaled = df.resample('S').mean()
df_resampled = df_downscaled.interpolate(method='linear')

The other dataframe contains the points of interest and no timestamp but a time-offset:

df_poi.head()

giving this output:

index, time_offset, value

0, 00:25, 60
1, 01:18, 60
2, 01:30, 100
3, 01:55, 100
4, 02:16, 100

Question & Task: But now I'm stuck. I was thinking to convert the time_offset column of the second dataframe to a period or datetime type and add this to the first timestamp (start_time) of the first dataframe.

But a) I have no idea how to convert the string of format mm:ss (Minutes:Seconds) to a datetime-offset (or period) and add it to another timestamp.

The start_time to which all offsets have to be added, would be this:

start_time = df_resampled.index[0]

Once the offset is converted to a real timestamp, I would join the second dataframe with the first one on the timestamp column.

Upvotes: 1

Views: 1479

Answers (1)

unutbu
unutbu

Reputation: 880359

pd.to_timedelta can convert strings in HH:MM:SS format to timedelta64s. Since your strings are in MM:SS format, you could use

df_poi['time_offset'] = pd.to_timedelta('00:' + df_poi['time_offset'])

to convert them to timedelta64s. (The ('00:' + df_poi['time_offset']) adds 00: to each string in df_poi['time_offset'] to form a new string in HH:MM:SS format.)

You can add start to df_poi['time_offset']:

start = df.index[0]
df_poi['Time'] = df_poi['time_offset'] + start

Now you can merge the two DataFrames:

result = pd.merge(df, df_poi, how='right', left_index=True, right_on='Time')

import pandas as pd
import numpy as np
np.random.seed(2017)
N, M = 1000, 5
df = pd.DataFrame({'Time':pd.date_range('2017-8-24 13:33:46', periods=N, freq='S'),
                   'Long':np.random.uniform(0, 360, size=N),
                   'Lat':np.random.uniform(-90, 90, size=N),})
df = df.set_index(['Time'])

df_poi = pd.DataFrame({'time_offset':['00:25', '01:18', '01:30', '01:55', '02:16'],
                       'value':np.random.randint(100, size=M)})

df_poi['time_offset'] = pd.to_timedelta('00:' + df_poi['time_offset'])
start = df.index[0]
df_poi['Time'] = df_poi['time_offset'] + start
result = pd.merge(df, df_poi, how='right', left_index=True, right_on='Time')

yields

         Lat        Long time_offset  value                Time
0 -19.851775  276.063876    00:00:25     28 2017-08-24 13:34:11
1  22.399545   61.956233    00:01:18     68 2017-08-24 13:35:04
2  35.472442  275.165153    00:01:30     56 2017-08-24 13:35:16
3 -60.588755   91.961901    00:01:55      2 2017-08-24 13:35:41
4  34.339641    4.033255    00:02:16     75 2017-08-24 13:36:02

Upvotes: 1

Related Questions