Reputation: 81
I'm using the top solution here to determine the number of rows that have start and end times overlapping with the given row. However, I need these overlaps to be determined by groups and not across the whole dataframe.
The data I'm working with has start and end times for conversations and the name of the person involved:
id start_time end_time name
1 2021-02-10 10:37:35 2021-02-10 12:16:22 Bob
2 2021-02-10 11:09:39 2021-02-10 13:06:25 Bob
3 2021-02-10 12:10:33 2021-02-10 17:06:26 Bob
4 2021-02-10 15:05:08 2021-02-10 21:07:05 Sally
5 2021-02-10 21:07:26 2021-02-10 21:26:37 Sally
This is the solution from the previous post:
ends = df['start_time'].values < df['end_time'].values[:, None]
starts = df['start_time'].values > df['start_time'].values[:, None]
d['overlap'] = (ends & starts).sum(0)
df
But this records overlap between conversations 3 and 4, whereas I'm only looking for overlap between 1 - 3 or between 4 - 5.
What I'm getting now:
id start_time end_time name overlap
1 2021-02-10 10:37:35 2021-02-10 12:16:22 Bob 2
2 2021-02-10 11:09:39 2021-02-10 13:06:25 Bob 1
3 2021-02-10 12:10:33 2021-02-10 17:06:26 Bob 1
4 2021-02-10 15:05:08 2021-02-10 21:07:05 Sally 1
5 2021-02-10 21:07:26 2021-02-10 21:26:37 Sally 0
What I'd like to get:
id start_time end_time name overlap
1 2021-02-10 10:37:35 2021-02-10 12:16:22 Bob 2
2 2021-02-10 11:09:39 2021-02-10 13:06:25 Bob 1
3 2021-02-10 12:10:33 2021-02-10 17:06:26 Bob 0
4 2021-02-10 15:05:08 2021-02-10 21:07:05 Sally 1
5 2021-02-10 21:07:26 2021-02-10 21:26:37 Sally 0
Upvotes: 0
Views: 624
Reputation: 1284
I think this might give what you need.
Add in an extra & condition for matching on name too:
ends = df['start_time'].values < df['end_time'].values[:, None]
starts = df['start_time'].values > df['start_time'].values[:, None]
same_group = (df['name'].values == df['name'].values[:, None])
# sum across axis=1 !!!
df['overlap'] = (ends & starts & same_group).sum(1)
df
Upvotes: 1