Armbie
Armbie

Reputation: 197

Extracting dataframe rows in groups before specific row?

I have some event log data I need to reshape:

Timestamp Event SID User
2021-04-15 10:35 OPEN 7933711 user2
2021-04-15 10:37 OPEN 8022822 user1
2021-04-15 10:37 OPEN 7933711 user2
2021-04-15 10:41 OPEN 1234567 user2
2021-04-15 10:45 OPEN 1234567 user2
2021-04-15 11:39 OPEN 8022822 user1
2021-04-15 11:59 DO 7933711 user2
2021-04-15 12:24 OPEN 7933711 user2
2021-04-15 12:59 DO 8022822 user1
2021-04-15 13:25 OPEN 8022822 user1
2021-04-15 13:29 OPEN 7933711 user2
2021-04-15 14:27 OPEN 8022822 user1

For each user that does a DO operation on every SID I need to find the prior OPEN event.

Desired outcome:

Timestamp Event SID User
2021-04-15 10:37 OPEN 7933711 user2
2021-04-15 11:39 OPEN 8022822 user1

I can get the groups I need using

grouped = sorted.groupby(['SID', 'User']).filter(lambda x: (x['Event']=='DO').any())

but I can't get the next part to extract the OPEN event before the DO event. I've looked at using asof but I don't know pandas well enough.

My code so far:

import numpy as np
import pandas as pd
import sys

eventLogPath = 'test3.csv'

evTimeStamps_df = pd.read_csv(eventLogPath)

evTimeStamps_df['Timestamp'] = pd.to_datetime( evTimeStamps_df['Timestamp'], format='%Y-%m-%d %H:%M:%S')
evTimeStamps_df.Event = pd.Categorical(evTimeStamps_df.Event, ['OPEN', 'DO'], ordered=True)

sorted = evTimeStamps_df.sort_values(['Timestamp', 'Event'])

grouped = sorted.groupby(['SID', 'User']).filter(lambda x: (x['Event']=='DO').any())

Upvotes: 1

Views: 51

Answers (1)

tdy
tdy

Reputation: 41427

Try sorting the df and then shift() to get the neighboring OPEN event:

df.Timestamp = pd.to_datetime(df.Timestamp) # if not already datetime

df = df.sort_values(['User', 'Timestamp', 'SID', 'Event'])
df[df.Event.eq('DO').shift().bfill()]

#              Timestamp  Event      SID   User
# 9  2021-04-15 13:25:00   OPEN  8022822  user1
# 7  2021-04-15 12:24:00   OPEN  7933711  user2

Upvotes: 2

Related Questions