Martin Noah
Martin Noah

Reputation: 167

Finding similar or overlapping groups of times in Python

I have a list of start and end times, my_list, which contains different groupings of start and end times.

These records are group therapy session start/end times for patients entered by one therapist. I need similar times grouped together so that I can check if there were more than 3 lists in the groupings. So from 9:00 - 11:00, there were 4 patients when the rules only allow 3.

This question's answers provide help for help finding all of the overlapping times in one given list and I've tried variations of the solutions posted there. But I need to find overlapping times based on groups of similar times.

my_list = [
  [dt.time(8,45), dt.time(11,0)],
  [dt.time(9,0), dt.time(11,0)],
  [dt.time(9,15), dt.time(11,0)],
  [dt.time(9,30), dt.time(11,0)],
  [dt.time(12,30), dt.time(15,0)],
  [dt.time(12,30), dt.time(15,0)],
  [dt.time(12,45), dt.time(15,0)],
  ]

I need to group similar/duplicated/overlapping times...

group_one = [
  [dt.time(8,45), dt.time(11,0)],
  [dt.time(9,0), dt.time(11,0)],
  [dt.time(9,15), dt.time(11,0)],
  [dt.time(9,30), dt.time(11,0)]
]

group_two = [
  [dt.time(12,30), dt.time(15,0)],
  [dt.time(12,30), dt.time(15,0)],
  [dt.time(12,45), dt.time(15,15)]
]

Eventually, checking len(group_one) > 3 returns True

# Pseudo-code
 for times in my_list:
  if start or end times are equal to or overlap each other:
    throw the times into separate, similar lists (or dicts where key = timeslot?)
  else:
    if start or end times....

Solution can be pure Python or Pandas as this data is coming from a dataframe and will be manipulated there. I can't help but feel there is a magical way of doing this in Pandas, just stuck!

Edit: changed data in question to match answer.

Upvotes: 3

Views: 414

Answers (1)

Josmoor98
Josmoor98

Reputation: 1811

Assuming your dataframe looks something like this and there are no cascading overlaps as pointed out in the comments by Prune.

df = pd.DataFrame([[datetime(2019, 8, 7, 8, 45),datetime(2019, 8, 7, 11)],
                   [datetime(2019, 8, 7, 9),datetime(2019, 8, 7, 11)],
                   [datetime(2019, 8, 7, 9, 15),datetime(2019, 8, 7, 11)],
                   [datetime(2019, 8, 7, 9, 30), datetime(2019, 8, 7, 11)],
                   [datetime(2019, 8, 7, 12, 30),datetime(2019, 8, 7, 15)],
                   [datetime(2019, 8, 7, 12, 30),datetime(2019, 8, 7, 15)],
                   [datetime(2019, 8, 7, 12, 45),datetime(2019, 8, 7, 15)],
                   [datetime(2019, 8, 7, 15, 30),datetime(2019, 8, 7, 17)],
                   [datetime(2019, 8, 7, 15, 30),datetime(2019, 8, 7, 17)],
                   [datetime(2019, 8, 7, 15, 45),datetime(2019, 8, 7, 17)],
                   [datetime(2019, 8, 7, 18, 30),datetime(2019, 8, 7, 19)],
                   [datetime(2019, 8, 7, 18, 30),datetime(2019, 8, 7, 19)],
                   [datetime(2019, 8, 7, 18, 45),datetime(2019, 8, 7, 19)]], 
                   columns = ["start", "end"])

If the data structure in your question is consistent with all of your data, then the following should work. If there any cascading cases, as pointed out, this won't work.


Note, df['start'] needs to be sorted for this procedure to work.


idx = df.index[(df['end'].shift() <= df['start'])]
overlapping groups = np.split(df, idx)

Which yields a list of DataFrames.

[                start                 end
 0 2019-08-07 08:45:00 2019-08-07 11:00:00
 1 2019-08-07 09:00:00 2019-08-07 11:00:00
 2 2019-08-07 09:15:00 2019-08-07 11:00:00
 3 2019-08-07 09:30:00 2019-08-07 11:00:00,
                 start                 end
 4 2019-08-07 12:30:00 2019-08-07 15:00:00
 5 2019-08-07 12:30:00 2019-08-07 15:00:00
 6 2019-08-07 12:45:00 2019-08-07 15:00:00,
                 start                 end
 7 2019-08-07 15:30:00 2019-08-07 17:00:00
 8 2019-08-07 15:30:00 2019-08-07 17:00:00
 9 2019-08-07 15:45:00 2019-08-07 17:00:00,
                 start                 end
10 2019-08-07 18:30:00 2019-08-07 19:00:00
11 2019-08-07 18:30:00 2019-08-07 19:00:00
12 2019-08-07 18:45:00 2019-08-07 19:00:00]

Upvotes: 2

Related Questions