Reputation: 7997
I have two dataframes - one of calls made to customers and another identifying active service durations by client. Each client can have multiple services, but they will not overlap.
df_calls = pd.DataFrame([['A','2016-02-03',1],['A','2016-05-11',2],['A','2016-10-01',3],['A','2016-11-02',4],
['B','2016-01-10',5],['B','2016-04-25',6]], columns = ['cust_id','call_date','call_id'])
print df_calls
cust_id call_date call_id
0 A 2016-02-03 1
1 A 2016-05-11 2
2 A 2016-10-01 3
3 A 2016-11-02 4
4 B 2016-01-10 5
5 B 2016-04-25 6
and
df_active = pd.DataFrame([['A','2016-01-10','2016-03-15',1],['A','2016-09-10','2016-11-15',2],
['B','2016-01-02','2016-03-17',3]], columns = ['cust_id','service_start','service_end','service_id'])
print df_active
cust_id service_start service_end service_id
0 A 2016-01-10 2016-03-15 1
1 A 2016-09-10 2016-11-15 2
2 B 2016-01-02 2016-03-17 3
I need to find the service_id each calls belongs to, identified by service_start and service_end dates. If a call does not fall between dates, they should remain in the dataset.
Here's what I tried so far:
df_test_output = pd.merge(df_calls,df_active, how = 'left',on = ['cust_id'])
df_test_output = df_test_output[(df_test_output['call_date']>= df_test_output['service_start'])
& (df_test_output['call_date']<= df_test_output['service_end'])].drop(['service_start','service_end'],axis = 1)
print df_test_output
cust_id call_date call_id service_id
0 A 2016-02-03 1 1
5 A 2016-10-01 3 2
7 A 2016-11-02 4 2
8 B 2016-01-10 5 3
This drops all the calls that were not between service dates. Any thoughts on how I can merge on the service_id where it meets the criteria, but retain the remaining records?
The result should look like this:
#do black magic
print df_calls
cust_id call_date call_id service_id
0 A 2016-02-03 1 1.0
1 A 2016-05-11 2 NaN
2 A 2016-10-01 3 2.0
3 A 2016-11-02 4 2.0
4 B 2016-01-10 5 3.0
5 B 2016-04-25 6 NaN
Upvotes: 3
Views: 642