Reputation: 1293
I am trying to categorise users based on their lifecycle. Given below Pandas dataframe shows the number of times a customer raised a ticket depending on how long they have used the product.
master dataframe
cust_id,start_date,end_date
101,02/01/2019,12/01/2019
101,14/02/2019,24/04/2019
101,27/04/2019,02/05/2019
102,25/01/2019,02/02/2019
103,02/01/2019,22/01/2019
Master lookup table
start_date,end_date,project_name
01/01/2019,13/01/2019,project_a
14/01/2019,13/02/2019,project_b
15/02/2019,13/03/2019,project_c
14/03/2019,13/06/2019,project_d
I am trying to map the above two data frames such that I am able to add project_name
to the master dataframe
Expected output:
cust_id,start_date,end_date,project_name
101,02/01/2019,12/01/2019,project_a
101,14/02/2019,24/04/2019,project_c
101,14/02/2019,24/04/2019,project_d
101,27/04/2019,02/05/2019,project_d
102,25/01/2019,02/02/2019,project_b
103,02/01/2019,22/01/2019,project_a
103,02/01/2019,22/01/2019,project_b
I do expect duplicate rows in the final output as a single row in the master dataframe would fall under multiple rows of master lookup table
Upvotes: 1
Views: 379
Reputation: 862406
I think you need:
df = df1.assign(a=1).merge(df2.assign(a=1), on='a')
m1 = df['start_date_y'].between(df['start_date_x'], df['end_date_x'])
m2 = df['end_date_y'].between(df['start_date_x'], df['end_date_x'])
df = df[m1 | m2]
print (df)
cust_id start_date_x end_date_x a start_date_y end_date_y project_name
1 101 2019-02-01 2019-12-01 1 2019-01-14 2019-02-13 project_b
2 101 2019-02-01 2019-12-01 1 2019-02-15 2019-03-13 project_c
3 101 2019-02-01 2019-12-01 1 2019-03-14 2019-06-13 project_d
6 101 2019-02-14 2019-04-24 1 2019-02-15 2019-03-13 project_c
7 101 2019-02-14 2019-04-24 1 2019-03-14 2019-06-13 project_d
Upvotes: 1