Reputation: 157
I have the following dataset,
D_DATE BIN Number Disposition Unit Assigned Disposition ID
2018-01-04 10005 SWO Issued PLUMBING DIVISION 3456
2016-06-23 10005 SWO Issued SCAFFOLD UNIT 3467
2016-06-23 10005 SWO Rescinded SCAFFOLD UNIT 3467
2018-01-17 10005 SWO Rescinded PLUMBING DIVISION 3456
2019-01-04 10006 SWO Rescinded BEST SQUAD 3478
2018-12-21 10006 SWO Issued BEST SQUAD 3478
2020-02-10 10006 SWO Issued BEST SQUAD 3480
2020-02-25 10006 SWO Rescinded BEST SQUAD 3480
df = pd.DataFrame({'D_DATE':['2018-01-04','2016-06-23','2016-06-23','2018-01-17','2019-01-04','2018-12-21','2020-02-10','2020-02-25'],
'BIN Number': ['10005', '10005', '10005', '10005', '10006','10006','10006','10006'] ,
'Disposition': ['SWO Issued', 'SWO Issued', 'SWO Rescinded', 'SWO Rescinded','SWO Rescinded','SWO Issued','SWO Issued','SWO Rescinded'] ,
'Unit Assigned': ['PLUMBING DIVISION', 'SCAFFOLD UNIT', 'SCAFFOLD UNIT', 'PLUMBING DIVISION','BEST SQUAD','BEST SQUAD','BEST SQUAD','BEST SQUAD'],
' Disposition ID':['3456','3467','3467','3456','3478','3478','3480,'3480']})
I want to create a pivot table if possible, so that I have two columns for the date, one column for the issue date and another for rescinded date by each Disposition ID
, but in the pivot I need to maintain the unit, so I should end up with these columns:
Unit Assigned
, Issue Date
, Rescinded Date
,Disposition ID
Then next I want to calculate time difference between Issue Date and Rescinded date. The Disposition ID is unique. Each Disposition has an ID first time when SWO is issued so when again SWO is rescinded it can follow that same ID.
Ouput:
Unit Assigned SWO Issued SWO Rescinded Time Difference Disposition ID
PLUMBING DIVISION 2018-01-04 2018-01-17 13 days 3456
SCAFFOLD UNIT 2016-06-23 2016-06-23 0 days 3467
BEST SQUAD 2018-12-21 2019-01-04 14 days 3478
BEST SQUAD 2020-02-10 2020-02-25 15 days 3480
Appreciate any help. Thanks.
Upvotes: 1
Views: 595
Reputation: 150805
This is very similar to your recent question, except that it is easier since the Disposition ID
is given.
df['D_DATE'] = pd.to_datetime(df['D_DATE'])
(df.pivot_table(index=[' Disposition ID','BIN Number', 'Unit Assigned'],
columns='Disposition',
values='D_DATE',
aggfunc='first')
.reset_index()
.assign(Time_Difference=lambda x: x['SWO Rescinded']-x['SWO Issued'])
)
Upvotes: 1