Reputation: 197
I have some event log data I need to reshape:
Timestamp | Event | SID | User |
---|---|---|---|
2021-04-15 10:35 | OPEN | 7933711 | user2 |
2021-04-15 10:37 | OPEN | 8022822 | user1 |
2021-04-15 10:37 | OPEN | 7933711 | user2 |
2021-04-15 10:41 | OPEN | 1234567 | user2 |
2021-04-15 10:45 | OPEN | 1234567 | user2 |
2021-04-15 11:39 | OPEN | 8022822 | user1 |
2021-04-15 11:59 | DO | 7933711 | user2 |
2021-04-15 12:24 | OPEN | 7933711 | user2 |
2021-04-15 12:59 | DO | 8022822 | user1 |
2021-04-15 13:25 | OPEN | 8022822 | user1 |
2021-04-15 13:29 | OPEN | 7933711 | user2 |
2021-04-15 14:27 | OPEN | 8022822 | user1 |
For each user that does a DO
operation on every SID
I need to find the prior OPEN
event.
Desired outcome:
Timestamp | Event | SID | User |
---|---|---|---|
2021-04-15 10:37 | OPEN | 7933711 | user2 |
2021-04-15 11:39 | OPEN | 8022822 | user1 |
I can get the groups I need using
grouped = sorted.groupby(['SID', 'User']).filter(lambda x: (x['Event']=='DO').any())
but I can't get the next part to extract the OPEN
event before the DO
event. I've looked at using asof
but I don't know pandas well enough.
My code so far:
import numpy as np
import pandas as pd
import sys
eventLogPath = 'test3.csv'
evTimeStamps_df = pd.read_csv(eventLogPath)
evTimeStamps_df['Timestamp'] = pd.to_datetime( evTimeStamps_df['Timestamp'], format='%Y-%m-%d %H:%M:%S')
evTimeStamps_df.Event = pd.Categorical(evTimeStamps_df.Event, ['OPEN', 'DO'], ordered=True)
sorted = evTimeStamps_df.sort_values(['Timestamp', 'Event'])
grouped = sorted.groupby(['SID', 'User']).filter(lambda x: (x['Event']=='DO').any())
Upvotes: 1
Views: 51
Reputation: 41427
Try sorting the df
and then shift()
to get the neighboring OPEN
event:
df.Timestamp = pd.to_datetime(df.Timestamp) # if not already datetime
df = df.sort_values(['User', 'Timestamp', 'SID', 'Event'])
df[df.Event.eq('DO').shift().bfill()]
# Timestamp Event SID User
# 9 2021-04-15 13:25:00 OPEN 8022822 user1
# 7 2021-04-15 12:24:00 OPEN 7933711 user2
Upvotes: 2