Reputation: 105
Note: I checked the other articles (here) on mapping through merge but it doesnt work with intervals.
I have 2 dataframes where df1:
Column1,Column2
1,2021-06-07
2,2021-06-10
3,2021-06-25
and df2:
Status,Type,Start,End
A,Closed,2021-06-06,2021-06-17
B,Closed,2021-06-20,2021-07-01
C,Closed,2021-07-04,2021-07-22
D,Active,2021-07-25,2021-08-05
Expected result for df1 is
Column1,Column2,Status
1,2021-06-07,A
2,2021-06-10,A
3,2021-06-25,B
Where df2 is used as a map to search the given date is in between Start and End columns.
Upvotes: 0
Views: 99
Reputation: 13251
merge_asof
will give less excess columns to filter out~
# Ensure your date columns are proper datetimes with something like:
# df1.Column2 = pd.to_datetime(df1.Column2)
out = pd.merge_asof(df1, df2, left_on='Column2', right_on='Start')[lambda x: x.Column2.lt(x.End)]
print(out)
# Output:
Column1 Column2 Status Type Start End
0 1 2021-06-07 A Closed 2021-06-06 2021-06-17
1 2 2021-06-10 A Closed 2021-06-06 2021-06-17
2 3 2021-06-25 B Closed 2021-06-20 2021-07-01
Upvotes: 1
Reputation: 120559
Yes merge
can work for an interval using the cross
product of the two dataframes and filter out according your condition:
out = df1.merge(df2, how='cross').query('(Start <= Column2) & (Column2 <= End)')
print(out)
# Output
Column1 Column2 Status Type Start End
0 1 2021-06-07 A Closed 2021-06-06 2021-06-17
4 2 2021-06-10 A Closed 2021-06-06 2021-06-17
9 3 2021-06-25 B Closed 2021-06-20 2021-07-01
Upvotes: 1
Reputation: 4318
This is not an elegant solution but perhaps does the job:
for idx in df1.index:
status = None
for idx2 in df2.index:
start = pd.Timestamp(df2["Start"][idx2])
end = pd.Timestamp(df2["End"][idx2])
value = pd.Timestamp(df1["Column2"][idx])
if start <= value < end:
status = df2["Status"][idx2]
break
df1["Status"][idx] = status
Upvotes: 0