scott martin
scott martin

Reputation: 1293

Pandas - Mapping two Dataframe based on date ranges

I am trying to categorise users based on their lifecycle. Given below Pandas dataframe shows the number of times a customer raised a ticket depending on how long they have used the product.

master dataframe

cust_id,start_date,end_date
101,02/01/2019,12/01/2019
101,14/02/2019,24/04/2019
101,27/04/2019,02/05/2019
102,25/01/2019,02/02/2019
103,02/01/2019,22/01/2019

Master lookup table

 start_date,end_date,project_name
 01/01/2019,13/01/2019,project_a
 14/01/2019,13/02/2019,project_b
 15/02/2019,13/03/2019,project_c
 14/03/2019,13/06/2019,project_d

I am trying to map the above two data frames such that I am able to add project_name to the master dataframe

Expected output:

cust_id,start_date,end_date,project_name
101,02/01/2019,12/01/2019,project_a
101,14/02/2019,24/04/2019,project_c
101,14/02/2019,24/04/2019,project_d
101,27/04/2019,02/05/2019,project_d
102,25/01/2019,02/02/2019,project_b
103,02/01/2019,22/01/2019,project_a
103,02/01/2019,22/01/2019,project_b

I do expect duplicate rows in the final output as a single row in the master dataframe would fall under multiple rows of master lookup table

Upvotes: 1

Views: 379

Answers (1)

jezrael
jezrael

Reputation: 862406

I think you need:

df = df1.assign(a=1).merge(df2.assign(a=1), on='a')
m1 = df['start_date_y'].between(df['start_date_x'], df['end_date_x'])
m2 = df['end_date_y'].between(df['start_date_x'], df['end_date_x'])

df = df[m1 | m2]
print (df)
   cust_id start_date_x end_date_x  a start_date_y end_date_y project_name
1      101   2019-02-01 2019-12-01  1   2019-01-14 2019-02-13    project_b
2      101   2019-02-01 2019-12-01  1   2019-02-15 2019-03-13    project_c
3      101   2019-02-01 2019-12-01  1   2019-03-14 2019-06-13    project_d
6      101   2019-02-14 2019-04-24  1   2019-02-15 2019-03-13    project_c
7      101   2019-02-14 2019-04-24  1   2019-03-14 2019-06-13    project_d

Upvotes: 1

Related Questions