Reputation: 173
I have a dataframe, where I have a ticket IDs and subsequent actions taken with the date time column a shown in the table below.
ticketID ChangeDate OldStatus NewStatus
0 1012327 2019-03-18 09:00:32.903 R or O Action mail sent to client
1 1012327 2019-03-18 09:21:34.820 Action mail sent to client Response Client - R
2 1012327 2019-03-18 09:34:21.890 Response Client - R Status Updated
3 1012328 2019-03-18 07:00:09.960 R or O ticket Closed - None
4 1012328 2019-03-18 07:09:31.420 ticket Closed - None Status Updated
5 1012329 2019-03-18 06:52:03.490 R or O ticket Closed - Satisfied
6 1012329 2019-03-18 07:09:33.433 ticket Closed - Satisfied Status Updated
7 1012330 2019-03-18 10:25:13.493 R or O Action mail sent to Service
8 1012330 2019-03-18 10:55:20.963 Action mail sent to Service ticket Closed - Service Responded
9 1012330 2019-03-18 11:02:05.327 ticket Closed - Service Responded Status Updated
10 1012332 2019-03-18 09:00:41.967 R or O Action mail sent to client
11 1012332 2019-03-18 10:24:20.150 Action mail sent to client Response Client - R
12 1012332 2019-03-18 10:32:40.717 Response Client - R Status Updated
Now, I have certain ticket IDs and some of them are having more observations based on the statuses provided. You can also see that, for a ticket id, for next observation, new status becomes old status and a new status is provided for the same, which keeps updated, until some closing action i taken.
I want to create a new dataframe/series which has the following format.
ticket ID | Datetime1 | Oldest Status | New Status | Datetime2 | New Status2| Datetime3 | New Status3 ....
so that I have the oldest status and new statuses with dates as shown above till we do this for each of the ticket ID.
My plan is to use this dataset, to calculate the time differences later on.
Upvotes: 1
Views: 93
Reputation: 148965
I would first use a groupby on ticketID
to compute a rank per ticket, then pivot the dataframe using that rank as column and ticketID as index to obtain the expected data.
After sorting the columns you get the expected dataframe. Time to rename the columns and reset the index to have a nice dataframe. Code could be:
df['rank'] = df.groupby('ticketID').apply(lambda x:
pd.Series(range(len(x)))).values
resul = df.pivot('ticketID', 'rank').fillna('')
resul.columns = resul.columns.swaplevel()
resul.sort_index(axis=1,inplace=True, level=0, sort_remaining=False)
resul.columns = ['{1}_{0}'.format(*c) for c in resul.columns]
resul.reset_index(inplace=True)
With your example data it gives:
ticketID ChangeDate_0 OldStatus_0 NewStatus_0 ChangeDate_1 OldStatus_1 NewStatus_1 ChangeDate_2 OldStatus_2 NewStatus_2
0 1012327 2019-03-18 09:00:32.903 R or O Action mail sent to client 2019-03-18 09:21:34.820 Action mail sent to client Response Client - R 2019-03-18 09:34:21.890 Response Client - R Status Updated
1 1012328 2019-03-18 07:00:09.960 R or O ticket Closed - None 2019-03-18 07:09:31.420 ticket Closed - None Status Updated
2 1012329 2019-03-18 06:52:03.490 R or O ticket Closed - Satisfied 2019-03-18 07:09:33.433 ticket Closed - Satisfied Status Updated
3 1012330 2019-03-18 10:25:13.493 R or O Action mail sent to Service 2019-03-18 10:55:20.963 Action mail sent to Service ticket Closed - Service Responded 2019-03-18 11:02:05.327 ticket Closed - Service Responded Status Updated
4 1012332 2019-03-18 09:00:41.967 R or O Action mail sent to client 2019-03-18 10:24:20.150 Action mail sent to client Response Client - R 2019-03-18 10:32:40.717 Response Client - R Status Updated
Upvotes: 2
Reputation: 42916
We can groupby
on ticketID
and aggregate to a list for each column. Then we split these lists to columns and give them a prefix in this case to be able to sort them correctly:
def aggregate_data(dataframe, groupcol, col):
grp = dataframe.groupby(groupcol).agg(list)
temp = pd.DataFrame(grp[col].to_numpy().tolist(), index=grp.index)
temp.columns = [f'{c}_{col}' for c in temp.columns]
return temp
new = pd.concat([aggregate_data(df, 'ticketID', column) for column in df.iloc[:, 1:].columns],
axis=1).sort_index(axis=1).reset_index()
ticketID 0_ChangeDate 0_NewStatus 0_OldStatus 1_ChangeDate 1_NewStatus 1_OldStatus 2_ChangeDate
0 1012327 2019-03-18 09:00:32.903 Action mail sent to client R or O 2019-03-18 09:21:34.820 Response Client - R Action mail sent to client 2019-03-18 09:34:21.890
1 1012328 2019-03-18 07:00:09.960 ticket Closed - None R or O 2019-03-18 07:09:31.420 Status Updated ticket Closed - None None
2 1012329 2019-03-18 06:52:03.490 ticket Closed - Satisfied R or O 2019-03-18 07:09:33.433 Status Updated ticket Closed - Satisfied None
3 1012330 2019-03-18 10:25:13.493 Action mail sent to Service R or O 2019-03-18 10:55:20.963 ticket Closed - Service Responded Action mail sent to Service 2019-03-18 11:02:05.327
4 1012332 2019-03-18 09:00:41.967 Action mail sent to client R or O 2019-03-18 10:24:20.150 Response Client - R Action mail sent to client 2019-03-18 10:32:40.717
Upvotes: 2