Murat Erenturk
Murat Erenturk

Reputation: 105

mapping the values in two data frames with intervals

Note: I checked the other articles (here) on mapping through merge but it doesnt work with intervals.

I have 2 dataframes where df1:

Column1,Column2
1,2021-06-07
2,2021-06-10
3,2021-06-25

and df2:

Status,Type,Start,End
A,Closed,2021-06-06,2021-06-17
B,Closed,2021-06-20,2021-07-01
C,Closed,2021-07-04,2021-07-22
D,Active,2021-07-25,2021-08-05

Expected result for df1 is

Column1,Column2,Status
1,2021-06-07,A
2,2021-06-10,A
3,2021-06-25,B

Where df2 is used as a map to search the given date is in between Start and End columns.

Upvotes: 0

Views: 99

Answers (3)

BeRT2me
BeRT2me

Reputation: 13251

merge_asof will give less excess columns to filter out~

# Ensure your date columns are proper datetimes with something like:
# df1.Column2 = pd.to_datetime(df1.Column2)

out = pd.merge_asof(df1, df2, left_on='Column2', right_on='Start')[lambda x: x.Column2.lt(x.End)]
print(out)

# Output:
   Column1    Column2 Status    Type      Start        End
0        1 2021-06-07      A  Closed 2021-06-06 2021-06-17
1        2 2021-06-10      A  Closed 2021-06-06 2021-06-17
2        3 2021-06-25      B  Closed 2021-06-20 2021-07-01

Upvotes: 1

Corralien
Corralien

Reputation: 120559

Yes merge can work for an interval using the cross product of the two dataframes and filter out according your condition:

out = df1.merge(df2, how='cross').query('(Start <= Column2) & (Column2 <= End)')
print(out)

# Output
   Column1     Column2 Status    Type       Start         End
0        1  2021-06-07      A  Closed  2021-06-06  2021-06-17
4        2  2021-06-10      A  Closed  2021-06-06  2021-06-17
9        3  2021-06-25      B  Closed  2021-06-20  2021-07-01

Upvotes: 1

Mehdi
Mehdi

Reputation: 4318

This is not an elegant solution but perhaps does the job:

for idx in df1.index:
    status = None
    for idx2 in df2.index:
        start = pd.Timestamp(df2["Start"][idx2])
        end = pd.Timestamp(df2["End"][idx2])
        value = pd.Timestamp(df1["Column2"][idx])
        if start <= value < end:
            status = df2["Status"][idx2]
            break
    df1["Status"][idx] = status

Upvotes: 0

Related Questions