user1889297
user1889297

Reputation: 494

Merge DataFrames based on time ranges

I need to fill values in one DataFrame based on time range in other DataFrame. I can't use loops since it will take too long. I got millions of line in the first df ad thousands in the second. Example for input

import numpy as np
import pandas as pd
df1 = pd.DataFrame({'value1': np.random.random(14)})
df1['Time'] = pd.date_range('2022-1-1', periods=14, freq='1s')
df2 = pd.DataFrame({'start':['2022-01-01 00:00:03', '2022-01-01 00:00:8'], 'end':['2022-01-01 00:00:05', '2022-01-01 00:00:12'], 'Value2': [2, 5]})
df2.start, df2.end = pd.to_datetime(df2.start), pd.to_datetime(df2.end)

and the result should be like that

Source 1

time ranges

enter image description here

And the result should look like that:

Target

Upvotes: 0

Views: 43

Answers (1)

sammywemmy
sammywemmy

Reputation: 28729

Since you have only two rows in df2, pd.merge_asof fits in nicely:

filtered = pd.merge_asof(df1, df2, left_on='Time', right_on='start')
filtered = filtered.loc[filtered.Time.le(filtered.end), ['Time', 'value1', 'Value2']]
df1.filter(['Time']).merge(filtered, how = 'left')

                  Time    value1  Value2
0  2022-01-01 00:00:00       NaN     NaN
1  2022-01-01 00:00:01       NaN     NaN
2  2022-01-01 00:00:02       NaN     NaN
3  2022-01-01 00:00:03  0.737995     2.0
4  2022-01-01 00:00:04  0.182492     2.0
5  2022-01-01 00:00:05  0.175452     2.0
6  2022-01-01 00:00:06       NaN     NaN
7  2022-01-01 00:00:07       NaN     NaN
8  2022-01-01 00:00:08  0.634401     5.0
9  2022-01-01 00:00:09  0.849432     5.0
10 2022-01-01 00:00:10  0.724455     5.0
11 2022-01-01 00:00:11  0.611024     5.0
12 2022-01-01 00:00:12  0.722443     5.0
13 2022-01-01 00:00:13       NaN     NaN

Hopefully, this is what you have in mind

Upvotes: 2

Related Questions