Create a new Dataframe based on Time Difference and a condition on columns in pandas dataframe

Question

I have a dataframe, where I have a ticket IDs and subsequent actions taken with the date time column a shown in the table below.

ticketID    ChangeDate  OldStatus   NewStatus
0   1012327 2019-03-18 09:00:32.903 R or O  Action mail sent to client
1   1012327 2019-03-18 09:21:34.820 Action mail sent to client  Response Client - R
2   1012327 2019-03-18 09:34:21.890 Response Client - R Status Updated
3   1012328 2019-03-18 07:00:09.960 R or O  ticket Closed - None
4   1012328 2019-03-18 07:09:31.420 ticket Closed - None    Status Updated
5   1012329 2019-03-18 06:52:03.490 R or O  ticket Closed - Satisfied
6   1012329 2019-03-18 07:09:33.433 ticket Closed - Satisfied   Status Updated
7   1012330 2019-03-18 10:25:13.493 R or O  Action mail sent to Service
8   1012330 2019-03-18 10:55:20.963 Action mail sent to Service ticket Closed - Service Responded
9   1012330 2019-03-18 11:02:05.327 ticket Closed - Service Responded   Status Updated
10  1012332 2019-03-18 09:00:41.967 R or O  Action mail sent to client
11  1012332 2019-03-18 10:24:20.150 Action mail sent to client  Response Client - R
12  1012332 2019-03-18 10:32:40.717 Response Client - R Status Updated

Now, I have certain ticket IDs and some of them are having more observations based on the statuses provided. You can also see that, for a ticket id, for next observation, new status becomes old status and a new status is provided for the same, which keeps updated, until some closing action i taken.

I want to create a new dataframe/series which has the following format.

ticket ID  | Datetime1  | Oldest Status  | New Status | Datetime2  | New Status2| Datetime3  | New Status3 ....

so that I have the oldest status and new statuses with dates as shown above till we do this for each of the ticket ID.

My plan is to use this dataset, to calculate the time differences later on.

Serge Ballesta · Accepted Answer

I would first use a groupby on ticketID to compute a rank per ticket, then pivot the dataframe using that rank as column and ticketID as index to obtain the expected data.

After sorting the columns you get the expected dataframe. Time to rename the columns and reset the index to have a nice dataframe. Code could be:

df['rank'] = df.groupby('ticketID').apply(lambda x:
                                          pd.Series(range(len(x)))).values
resul = df.pivot('ticketID', 'rank').fillna('')
resul.columns = resul.columns.swaplevel()
resul.sort_index(axis=1,inplace=True, level=0, sort_remaining=False)
resul.columns = ['{1}_{0}'.format(*c) for c in resul.columns]
resul.reset_index(inplace=True)

With your example data it gives:

   ticketID             ChangeDate_0 OldStatus_0                  NewStatus_0             ChangeDate_1                  OldStatus_1                        NewStatus_1             ChangeDate_2                        OldStatus_2     NewStatus_2
0   1012327  2019-03-18 09:00:32.903      R or O   Action mail sent to client  2019-03-18 09:21:34.820   Action mail sent to client                Response Client - R  2019-03-18 09:34:21.890                Response Client - R  Status Updated
1   1012328  2019-03-18 07:00:09.960      R or O         ticket Closed - None  2019-03-18 07:09:31.420         ticket Closed - None                     Status Updated                                                                            
2   1012329  2019-03-18 06:52:03.490      R or O    ticket Closed - Satisfied  2019-03-18 07:09:33.433    ticket Closed - Satisfied                     Status Updated                                                                            
3   1012330  2019-03-18 10:25:13.493      R or O  Action mail sent to Service  2019-03-18 10:55:20.963  Action mail sent to Service  ticket Closed - Service Responded  2019-03-18 11:02:05.327  ticket Closed - Service Responded  Status Updated
4   1012332  2019-03-18 09:00:41.967      R or O   Action mail sent to client  2019-03-18 10:24:20.150   Action mail sent to client                Response Client - R  2019-03-18 10:32:40.717                Response Client - R  Status Updated

Create a new Dataframe based on Time Difference and a condition on columns in pandas dataframe

Answers (2)

Related Questions