shubh
shubh

Reputation: 1

Lookup values from one Dataframe with another dataframe and then creating a new column in df1 based on if the condition is met

I have two data frames.

I am trying to lookup **datetime **value in the df1 dataframe to see if it is between Start Time and end time columns in df2 and if that is true then create a new column in df1 with the stage number from the df2 dataframe.

I can do that with the iteration function but that is too time consuming for big files. What are other ways i can do this ?

Thank you in advance

Upvotes: 0

Views: 1981

Answers (1)

Valdi_Bo
Valdi_Bo

Reputation: 30991

I created source DataFrames (df1 and df2) as:

             datetime
0 2022-01-03 10:30:00
1 2022-01-03 11:20:00
2 2022-01-05 12:00:00
3 2022-01-05 08:00:00
4 2022-01-06 16:05:00

and

           start time            end time  stage number
0 2022-01-03 10:00:00 2022-01-03 12:00:00             1
1 2022-01-04 10:00:00 2022-01-04 16:00:00             2
2 2022-01-05 10:00:00 2022-01-06 18:00:00             3
3 2022-01-10 10:00:00 2022-01-20 18:00:00             4

datetime, start time and end time columns are of datetime64[ns] type.

Note that 2022-01-05 08:00:00 in df1 is not contained between any pair of start / end times in df2. In such a case (no stage) the stage number will be set to 0.

Start with computing 3 auxiliary Numpy arrays:

st = df2['start time'].values
et = df2['end time'].values
dat = df1.datetime.values[:,None]

Then compute the new column as:

df1['stage'] = np.dot(np.logical_and(np.less_equal(st, dat),
    np.less_equal(dat, et)), df2['stage number'])

The result is:

             datetime  stage
0 2022-01-03 10:30:00      1
1 2022-01-03 11:20:00      1
2 2022-01-05 12:00:00      3
3 2022-01-05 08:00:00      0
4 2022-01-06 16:05:00      3

Upvotes: 1

Related Questions