Reputation: 449
Create data
actions = ['Start','Action1','Action2','Pause','Actoin2','Resume','Action1','Finish','Start','Action1','Finish']
start_date = datetime.datetime.strptime('14/10/21 09:00:00', '%d/%m/%y %H:%M:%S')
date_list = [start_date + datetime.timedelta(seconds=x) for x in range(0,11)]
values = [1,1,2,1,2,1,5,1,1,1,1]
df = pd.DataFrame({'ActionType': actions,
'Timestamp': date_list,
'Value': values})
ActionType | Timestamp | Value |
---|---|---|
Start | 2021-10-14 09:00:00 | 1 |
Action1 | 2021-10-14 09:00:01 | 1 |
Action2 | 2021-10-14 09:00:02 | 2 |
Pause | 2021-10-14 09:00:03 | 1 |
Action2 | 2021-10-14 09:00:04 | 2 |
Restart | 2021-10-14 09:00:05 | 1 |
Action1 | 2021-10-14 09:00:06 | 5 |
Finish | 2021-10-14 09:00:07 | 1 |
Start | 2021-10-14 09:00:08 | 1 |
Action1 | 2021-10-14 09:00:09 | 1 |
Finish | 2021-10-14 09:00:010 | 1 |
See how there are two "sessions" taking place. I want to mark each session in a new column.
The output should look like this:
output = pd.DataFrame({'Session:': [0,0,0,0,0,0,0,0,1,1,1],
'ActionType': actions,
'Timestamp': date_list,
'RealTimeElapsed': [0,1,2,3,3,3,4,5,0,1,2],
'Value': values
})
Session | ActionType | Timestamp | RealTimeElapsed | Value |
---|---|---|---|---|
0 | Start | 2021-10-14 09:00:00 | 0 | 1 |
0 | Action1 | 2021-10-14 09:00:01 | 1 | 1 |
0 | Action2 | 2021-10-14 09:00:02 | 2 | 1 |
0 | Pause | 2021-10-14 09:00:03 | 3 | 1 |
0 | Action2 | 2021-10-14 09:00:04 | 3 | 1 |
0 | Resume | 2021-10-14 09:00:05 | 3 | 1 |
0 | Action1 | 2021-10-14 09:00:06 | 4 | 1 |
0 | Finish | 2021-10-14 09:00:07 | 5 | 1 |
1 | Start | 2021-10-14 09:00:08 | 0 | 1 |
1 | Action1 | 2021-10-14 09:00:09 | 1 | 1 |
1 | Finish | 2021-10-14 09:00:010 | 2 | 1 |
Already considered:
Upvotes: 0
Views: 143
Reputation:
My solution is:
df.insert(0,"Session",np.where(df["ActionType"]=="Start",1,0).cumsum()-1)
def ufunc(df):
a = list()
for i,j in df.groupby(df.ActionType.isin(["Start","Pause","Resume"]).cumsum()):
if j.ActionType.iloc[0] == "Start":
a.extend(np.array(range(len(j))))
elif j.ActionType.iloc[0] == "Pause":
a.extend([max(a)+1] * len(j))
elif j.ActionType.iloc[0] == "Resume":
a.extend(np.array(range(len(j))) + max(a))
return a
df.insert(len(df.columns)-1,"RealTimeElapsed",df.groupby("Session").apply(ufunc).explode().values)
df
Session ActionType Timestamp RealTimeElapsed Value
0 0 Start 2021-10-14 09:00:00 0 1
1 0 Action1 2021-10-14 09:00:01 1 1
2 0 Action2 2021-10-14 09:00:02 2 2
3 0 Pause 2021-10-14 09:00:03 3 1
4 0 Actoin3 2021-10-14 09:00:04 3 2
5 0 Resume 2021-10-14 09:00:05 3 1
6 0 Action1 2021-10-14 09:00:06 4 5
7 0 Finish 2021-10-14 09:00:07 5 1
8 1 Start 2021-10-14 09:00:08 0 1
9 1 Action1 2021-10-14 09:00:09 1 1
10 1 Finish 2021-10-14 09:00:10 2 1
Upvotes: 1