Reputation: 490

Identify all overlapping tuples in list

I've currently got a list of tuples (though I control creation of the list and tuples, so they could be altered in type if needed). Each tuple has a start and end integer and a string with an ID for that range's source. What I'm trying to do is identify all of the overlapping ranges within the tuples.

Currently I have

a = [(0, 98, '122:R'), 
     (100, 210, '124:R'),
     (180, 398, '125:R'),
     (200, 298, '123:R')]
highNum = 0
highNumItem = ''
for item in a:
    if item[0] < highNum:
        print(highNumItem + ' overlaps ' + item[2])
        if item[1] > highNum:
            highNum = item[1]
            highNumItem = item[2]

    
# 124:R overlaps 125:R
# 125:R overlaps 123:R

Which outputs enough information that overlaps should be able to be manually review and fixed. But, it misses identifying some sets of overlaps. I can't help thinking there's a relatively obvious solution I'm just missing or not using the right search terms to find examples of. But ideally I'd like the output to actually be

124:R overlaps 125:R & 123:R
125:R overlaps 123:R

But using my comparison method, I can't see a way to catch the rare instance where an overlap spans more than just 2 adjacent ranges. If anyone could point me to a function or comparison method appropriate to this, I'd greatly appreciate.

Also, if it matters, I'm currently stuck with python 2.7, but need to be able to port solution to 3.x when 3rd party applications allow it.

Upvotes: 6

Answers (5)

Ed Ward

Reputation: 2331

This should work:

import operator

def get_overlaps(end, remaining):
    output = []
    for r in remaining:
        if r[0] < end:
            # starts before the end
            output.append(r[2])
            continue
        break
    return output

def get_all_overlaps(lst):
    # thanks @Elan-R for this simplification
    for i, (start, end, name) in enumerate(lst):        
        overlaps = get_overlaps(end, lst[i+1:])
        if overlaps:
            print(name, "overlaps", " & ".join(overlaps))


a = [(0, 98, '122:R'), (100, 210, '124:R'), (180, 398, '125:R'), (200, 298, '123:R')]

# sort by start time
a.sort(key=operator.itemgetter(0)) # thanks to @moonGoose
get_all_overlaps(a)

Output:

124:R overlaps 125:R & 123:R
125:R overlaps 123:R

This code iterates over each item in the list, and then checks every subsequent item to see if the start time is less than the end time of the current item. If so, it adds the name to the list of overlaps. If not, it stops checking for the current item, as the start times increase, so there will be no further overlaps.

(Tested for Python 3.6, but should work with any version)

Upvotes: 2

Chris Charley

Reputation: 6613

Here is an example using intspan to calculate the overlaps. (Using Python 3.8)

from itertools import combinations
from intspan import intspan

a = [(0, 98, '122:R'), (100, 210, '124:R'), (180, 398, '125:R'), (200, 298, '123:R')]

d = {}
for one, two in combinations(a,2):
    # if the 2 ranges intersect
    if intspan.from_range(*one[0:2]) & intspan.from_range(*two[0:2]):
        d.setdefault(one[2], []).append(two[2])

for key, v in d.items():
    print(key + ',' + ','.join(v))

Prints:

124:R,125:R,123:R
125:R,123:R

Upvotes: 3

Elan-R

Reputation: 564

Checks the higher number of a pair with the lower number of another pair:

a = [(0, 98, '122:R'), (100, 210, '124:R'), (180, 398, '125:R'), (200, 298, '123:R')]

for i, base_data in enumerate(a):
    for check_data in a[i + 1:]:
        if base_data[1] > check_data[0]:
            print(f"{base_data[2]} overlaps {check_data[2]}")
#prints:
#124:R overlaps 125:R
#124:R overlaps 123:R
#125:R overlaps 123:R

If you want it stored in groups:

from collections import defaultdict

a = [(0, 98, '122:R'), (100, 210, '124:R'), (180, 398, '125:R'), (200, 298, '123:R')]
d = defaultdict(list)

for i, base_data in enumerate(a):
    for check_data in a[i + 1:]:
        if base_data[1] > check_data[0]:
            d[base_data[2]].append(check_data[2])


print(d)
#prints:
#defaultdict(<class 'list'>, {'124:R': ['125:R', '123:R'], '125:R': ['123:R']})
#but can *easily* be iterated over to pretty print:
print("\n".join([f'{key} overlaps {" & ".join(d[key])}' for key in d]))
#prints:
#124:R overlaps 125:R & 123:R
#125:R overlaps 123:R

The dictionary is far better than printing as it actually stores the data instead of just printing it. Printing the data can also be more controlled. Also, using a defaultdict over a dict makes the code more compact.

Upvotes: 3

adjan

Reputation: 13684

First, make sure you have sorted your list by interval start values. Then you just need to loop through them and group tuples together as long as their interval start value is lower than the end value of the first group item.

items = [(0, 98, '122:R'), (100, 210, '124:R'), (180, 398, '125:R'), (200, 298, '123:R')]
    
sorted(items, key=lambda x: x[0])

overlaps = []
while items:
  overlap = list(takewhile(lambda item:item[0] < items[0][1],items))
  overlaps.append(overlap)
  items = items[len(overlap):]

Results in overlaps being

[
   [(0, 98, '122:R')],
   [(100, 210, '124:R'), (180, 398, '125:R'), (200, 298, '123:R')]
]

Upvotes: 2

Lorenzo Paolin

Reputation: 289

Here it is a solution in python 3.x

    a = [(0, 98, '122:R'), (100, 210, '124:R'), (180, 398, '125:R'), (200, 298, '123:R')]
    
    for i in range(len(a)-1):
        i_low, i_high, i_id = a[i]
        for j in range(i+1, len(a)):
            j_low, j_high, j_id = a[j]
            if i_low < j_low < i_high or j_low < i_low < j_high:
                print(i_id, " overlaps with ", j_id)

In case python 2 didn't support that unpacking system:

    a = [(0, 98, '122:R'), (100, 210, '124:R'), (180, 398, '125:R'), (200, 298, '123:R')]
    
    for i in range(len(a)-1):
        i_low, i_high, i_id = a[i][0], a[i][1], a[i][2]
        for j in range(i+1, len(a)):
            j_low, j_high, j_id =a[j][0], a[j][1], a[j][2]
            if i_low < j_low < i_high or j_low < i_low < j_high:
                print(i_id, " overlaps with ", j_id)

Upvotes: 2

Identify all overlapping tuples in list

Answers (5)

Related Questions