Mystical Me
Mystical Me

Reputation: 137

Concurrent Conflicts

I'm trying to find consecutive datetimes in Python. I got to the point where I'm able to find if each row conflicts through a loop but am stuck on how to find out if an Event was concurrent. Any suggestions on how I can accomplish this? Open to a similar approach as well!

Concurrent means when Name and Event Date is the same and the count of consecutive conflicts >= 3.

Sample Data 1:

Event ID Name Date Event Start Event End
123 Hoper, Charles 8/4/20 8/4/20 8:30 AM 8/4/20 10:30 AM
456 Hoper, Charles 8/4/20 8/4/20 8:50 AM 8/4/20 9:20 AM
789 Hoper, Charles 8/4/20 8/4/20 8:30 AM 8/4/20 10 AM
1011 Perez, Daniel 8/10/20 8/10/20 9 AM 8/10/20 11 AM
1213 Shah, Kim 8/5/20 8/5/20 12 PM 8/5/20 1 PM
1415 Shah, Kim 8/5/20 8/5/20 12:30 PM 8/5/20 1 PM

Current Code:

import pandas as pd
import numpy as np

df = pd.read_excel(r'[Path]\TestConcurrent.xlsx')

df['Start'] = df['Event Start']
df['End'] = df['Event End']



df['conflict'] = len(df) 

 #Edit from answer
    df['concurrent']  = df.groupby(['Name','Event Date','conflict'])['Event ID'].transform('count').ge(3)
    
    print(df)

Current Output:

[![enter image description here][1]][1]

Expected Output:

[![enter image description here][2]][2]

Upvotes: 3

Views: 144

Answers (1)

mitoRibo
mitoRibo

Reputation: 4548

Ok, final approach! If I understand then you don't care about the "Consecutive" column, you just want to know about the 3 overlapping windows in a row. Here's an approach that tries to answer that question immediately. It passes the two test datasets (thanks for editing!)

import pandas as pd
import numpy as np
import itertools
import io

#Creating the test df you provided
df = pd.read_csv(io.StringIO("""
Event ID;Name;Date;Event Start;Event End
123;Hoper, Charles;8/4/20;8/4/20 8:30 AM;8/4/20 10:30 AM
456;Hoper, Charles;8/4/20;8/4/20 8:50 AM;8/4/20 9:20 AM
789;Hoper, Charles;8/4/20;8/4/20 8:30 AM;8/4/20 10 AM
1011;Perez, Daniel;8/10/20;8/10/20 9 AM;8/10/20 11 AM
1213;Shah, Kim;8/5/20;8/5/20 12 PM;8/5/20 1 PM
1415;Shah, Kim;8/5/20;8/5/20 12:30 PM;8/5/20 1 PM
"""),sep=';')

#Overridding with The second test df
df = pd.read_csv(io.StringIO("""
Event ID;Name;Date;Event Start;Event End
88;Cooper, Herbert;10/20/20;10/20/20 8:10 AM;10/20/20 9:48 AM
99;Cooper, Herbert;10/20/20;10/20/20 9:19 AM;10/20/20 11:30 AM
10;Cooper, Herbert;10/20/20;10/20/20 11:52 AM;10/20/20 1:26 PM
11;Cooper, Herbert;10/20/20;10/20/20 1:22 AM;10/20/20 2:15 PM
12;Cooper, Herbert;10/20/20;10/20/20 3:23 PM;10/20/20 4:10 PM
"""),sep=';')

df['Event Start'] = pd.to_datetime(df['Event Start'])
df['Event End'] = pd.to_datetime(df['Event End'])
df['overlap'] = False

#Iterating line by line keeping track of whether 3 conflicts are found
last_name = None
last_date = None
last_end = pd.Timestamp.max
num_consecutive = 0

for i,r in df.iterrows():
    
    streak_continues = all([
        last_name == r['Name'],
        last_date == r['Date'],
        r['Event Start'] < last_end,
    ])
        
    if not streak_continues:
        if num_consecutive >= 3:
            df.loc[
                df['Name'].eq(last_name) & df['Date'].eq(last_date),
                'overlap'
            ] = True
            
        num_consecutive = 0
        
    last_name = r['Name']
    last_date = r['Date']
    last_end = r['Event End']
    num_consecutive += 1
        
df

Upvotes: 3

Related Questions