jspaeth
jspaeth

Reputation: 335

Concatenating two DataFrames with respect to dates

I think my problem involves a few parts. What do I have?

What do I want in the end?

What did I try so far?

Here is the code line:

allTheData = pd.concat([gpsDataFrame, no2DataFrame], axis=1)

I am new to Pandas and relatively new to Python. Hope you can help me with the two steps:

  1. Create a dataFrame 'allTheData' which includes chronologically all the measured times (either from gps or No2) and the correct data. For example if there is data from 15:30.05 from both dataframes only add one line and include all the 3 columns; if there is only data from gps at 15:30.07 include the gps data and set No2 to NaN or something.

  2. Interpolate the values so that I can choose a 1sec interval and get interpolated data from gps AND no2 for every 1sec, so each row.

Upvotes: 1

Views: 223

Answers (1)

Graipher
Graipher

Reputation: 7186

Use pandas.resample to adjust the two dataframes to have the same timestamps as index:

import pandas as pd
import numpy as np

# generate some sample data according to your question
date1 = pd.date_range("14:00", "18:00", freq="3S")
df1 = pd.DataFrame({"time": date1, "gps": np.random.rand(len(date1))})
date2 = pd.date_range("13:30", "18:30", freq="600ms")
df2 = pd.DataFrame({"time": date2, "no2": np.random.rand(len(date2))})

# set the timestamps as index
df1 = df1.set_index("time")
df2 = df2.set_index("time")

final_freq = "1S"

# upsample df1, interpolating
df1 = df1.resample(final_freq)
df1 = df1.interpolate(method='linear')    # without this, these entries are NaN

# downsample df2, averaging
df2 = df2.resample(final_freq).mean()

Then you can just join them:

df = df1.join(df2)

Note that you might have to change this slightly if your gps position is a tuple in a single column. In that case you might have to separate it to two columns, latitude and longitude, for the upsampling to work.

Instead of averaging for the downsampling, it might make sense to use a different function. If your NO2 sensor for example reports how much NO2 it saw in the last 0.6 seconds, then you would want .sum().

Upvotes: 2

Related Questions