Milo
Milo

Reputation: 185

Find overlapping time segments in python

I'm trying to check whether two time segments in the same file_id (identified by their duration expressed as time_from and time_to) overlap in the following dataframes:

df1
    id,file_id,time_from,time_to
    1,10,00:00:19,00:00:25
    2,12,00:02:39,00:02:49
    3,12,00:04:18,00:04:30
    4,12,00:05:30,00:05:55
    5,15,00:01:35,00:01:38
    6,18,00:07:35,00:07:48


df2 
    id,file_id,time_from,time_to
    1,10,00:00:18,00:00:26
    2,12,00:02:30,00:02:49
    3,12,00:05:28,00:05:56
    4,15,00:01:40,00:01:50

Basically I'd like to count the number of times segments overlap in the two dataframes for the same file_id, allowing for a given degree of distance between time_from and time_to (say +/- 4 seconds).

The output should be something like:

file_id, number_of_overlapping_segments
10, 1
12, 2
15, 1
18, 0

Here's the only case when two time segments are not overlapping and the algorithm should return 0 (assuming the distance is > 4 seconds). Every other situation should return 1:

               [_____]

   [________]           [______]

My approach so far has been to use pandas to extend the time values by 4 seconds left and right (limits) of the given time segment and create temporary dataframes that hold the rows with overlapping time segments. Example:

import pandas as pd

left_overlap = []
right_overlap = []

for f in list(set(df1.file_id) & set(df2.file_id)):
    for t_from, t_from_lim in list(zip(df2[df2.file_id==f]['time_from'],
                                       df1[df1file_id==f]['time_from_limit'])):
        if t_from > t_from_lim:
            left_overlap.append(df_2[(df2.file_id==audio) & \
                                             (df2.time_from==t_from)])

df_left_overlap = pd.concat(left_overlap).reset_index(drop=True)


for f in list(set(df1.file_id) & set(df2.file_id)):
    for t_to, t_to_lim in list(zip(df_left_overlap[df_left_overlap.file_id==f]['time_to'],
                                       df1[df1.file_id==f]['time_to_limit'])):
        if t_to < t_to_lim:
            right_overlap.append(
                df_left_overlap[(df_left_overlap.file_id==f) & \
                                    (df_left_overlap.time_to==t_to)])

overlap = pd.concat(right_overlap)

I think that this solution isn't efficient and I was hoping to find a more robust way of doing it.

Thanks in advance!

Upvotes: 3

Views: 700

Answers (1)

Andru Luvisi
Andru Luvisi

Reputation: 25318

It looks like you're adding properties that are time_from-(4 seconds) and time_to+(4 seconds). (Is that what time_from_limit and time_to_limit are?) It also looks like you're trying to detect overlaps, when it's actually much easier to test whether they don't overlap, and negate it.

For example (pseudocode):

for segment1 in df1:
    for segment2 in df2 records with matching fileid:
        if not (segment1.['time_from'] >= segment2.['time_to'] + (4 seconds) or
                segment2.['time_from'] >= segment1.['time_to'] + (4 seconds)):
        # They overlap
        counters[fileid]++

Upvotes: 1

Related Questions