Tahsin Alam
Tahsin Alam

Reputation: 157

Pandas pivot table on DATE column

I have the following dataset,

D_DATE       BIN Number   Disposition    Unit Assigned        Disposition ID
2018-01-04    10005      SWO Issued      PLUMBING DIVISION     3456
2016-06-23    10005      SWO Issued      SCAFFOLD UNIT         3467
2016-06-23    10005      SWO Rescinded   SCAFFOLD UNIT         3467
2018-01-17    10005      SWO Rescinded   PLUMBING DIVISION     3456
2019-01-04    10006      SWO Rescinded   BEST SQUAD            3478
2018-12-21    10006      SWO Issued      BEST SQUAD            3478
2020-02-10    10006      SWO Issued      BEST SQUAD            3480  
2020-02-25    10006      SWO Rescinded   BEST SQUAD            3480

df = pd.DataFrame({'D_DATE':['2018-01-04','2016-06-23','2016-06-23','2018-01-17','2019-01-04','2018-12-21','2020-02-10','2020-02-25'],
                        'BIN Number': ['10005', '10005', '10005', '10005', '10006','10006','10006','10006'] ,
                       'Disposition': ['SWO Issued', 'SWO Issued', 'SWO Rescinded', 'SWO Rescinded','SWO Rescinded','SWO Issued','SWO Issued','SWO Rescinded'] ,
                       'Unit Assigned': ['PLUMBING DIVISION', 'SCAFFOLD UNIT', 'SCAFFOLD UNIT', 'PLUMBING DIVISION','BEST SQUAD','BEST SQUAD','BEST SQUAD','BEST SQUAD'],
                       ' Disposition ID':['3456','3467','3467','3456','3478','3478','3480,'3480']}) 

I want to create a pivot table if possible, so that I have two columns for the date, one column for the issue date and another for rescinded date by each Disposition ID, but in the pivot I need to maintain the unit, so I should end up with these columns:

Unit Assigned, Issue Date, Rescinded Date,Disposition ID

Then next I want to calculate time difference between Issue Date and Rescinded date. The Disposition ID is unique. Each Disposition has an ID first time when SWO is issued so when again SWO is rescinded it can follow that same ID.

Ouput:

   Unit Assigned      SWO Issued     SWO Rescinded    Time Difference   Disposition ID
    PLUMBING DIVISION  2018-01-04     2018-01-17        13 days          3456
    SCAFFOLD UNIT      2016-06-23     2016-06-23        0 days           3467 
    BEST SQUAD         2018-12-21     2019-01-04        14 days          3478
    BEST SQUAD         2020-02-10     2020-02-25        15 days          3480

Appreciate any help. Thanks.

Upvotes: 1

Views: 595

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150805

This is very similar to your recent question, except that it is easier since the Disposition ID is given.

df['D_DATE'] = pd.to_datetime(df['D_DATE'])

(df.pivot_table(index=[' Disposition ID','BIN Number', 'Unit Assigned'],
              columns='Disposition',
              values='D_DATE',
              aggfunc='first')
   .reset_index()
   .assign(Time_Difference=lambda x: x['SWO Rescinded']-x['SWO Issued'])
)

Upvotes: 1

Related Questions