Reputation: 494
I need to fill values in one DataFrame based on time range in other DataFrame. I can't use loops since it will take too long. I got millions of line in the first df ad thousands in the second. Example for input
import numpy as np
import pandas as pd
df1 = pd.DataFrame({'value1': np.random.random(14)})
df1['Time'] = pd.date_range('2022-1-1', periods=14, freq='1s')
df2 = pd.DataFrame({'start':['2022-01-01 00:00:03', '2022-01-01 00:00:8'], 'end':['2022-01-01 00:00:05', '2022-01-01 00:00:12'], 'Value2': [2, 5]})
df2.start, df2.end = pd.to_datetime(df2.start), pd.to_datetime(df2.end)
and the result should be like that
time ranges
And the result should look like that:
Upvotes: 0
Views: 43
Reputation: 28729
Since you have only two rows in df2
, pd.merge_asof
fits in nicely:
filtered = pd.merge_asof(df1, df2, left_on='Time', right_on='start')
filtered = filtered.loc[filtered.Time.le(filtered.end), ['Time', 'value1', 'Value2']]
df1.filter(['Time']).merge(filtered, how = 'left')
Time value1 Value2
0 2022-01-01 00:00:00 NaN NaN
1 2022-01-01 00:00:01 NaN NaN
2 2022-01-01 00:00:02 NaN NaN
3 2022-01-01 00:00:03 0.737995 2.0
4 2022-01-01 00:00:04 0.182492 2.0
5 2022-01-01 00:00:05 0.175452 2.0
6 2022-01-01 00:00:06 NaN NaN
7 2022-01-01 00:00:07 NaN NaN
8 2022-01-01 00:00:08 0.634401 5.0
9 2022-01-01 00:00:09 0.849432 5.0
10 2022-01-01 00:00:10 0.724455 5.0
11 2022-01-01 00:00:11 0.611024 5.0
12 2022-01-01 00:00:12 0.722443 5.0
13 2022-01-01 00:00:13 NaN NaN
Hopefully, this is what you have in mind
Upvotes: 2