Reputation: 1
I have two data frames.
I am trying to lookup **datetime **value in the df1 dataframe to see if it is between Start Time and end time columns in df2 and if that is true then create a new column in df1 with the stage number from the df2 dataframe.
I can do that with the iteration function but that is too time consuming for big files. What are other ways i can do this ?
Thank you in advance
Upvotes: 0
Views: 1981
Reputation: 30991
I created source DataFrames (df1 and df2) as:
datetime
0 2022-01-03 10:30:00
1 2022-01-03 11:20:00
2 2022-01-05 12:00:00
3 2022-01-05 08:00:00
4 2022-01-06 16:05:00
and
start time end time stage number
0 2022-01-03 10:00:00 2022-01-03 12:00:00 1
1 2022-01-04 10:00:00 2022-01-04 16:00:00 2
2 2022-01-05 10:00:00 2022-01-06 18:00:00 3
3 2022-01-10 10:00:00 2022-01-20 18:00:00 4
datetime, start time and end time columns are of datetime64[ns] type.
Note that 2022-01-05 08:00:00 in df1 is not contained between any pair of start / end times in df2. In such a case (no stage) the stage number will be set to 0.
Start with computing 3 auxiliary Numpy arrays:
st = df2['start time'].values
et = df2['end time'].values
dat = df1.datetime.values[:,None]
Then compute the new column as:
df1['stage'] = np.dot(np.logical_and(np.less_equal(st, dat),
np.less_equal(dat, et)), df2['stage number'])
The result is:
datetime stage
0 2022-01-03 10:30:00 1
1 2022-01-03 11:20:00 1
2 2022-01-05 12:00:00 3
3 2022-01-05 08:00:00 0
4 2022-01-06 16:05:00 3
Upvotes: 1