Reputation: 490
I've currently got a list of tuples (though I control creation of the list and tuples, so they could be altered in type if needed). Each tuple has a start and end integer and a string with an ID for that range's source. What I'm trying to do is identify all of the overlapping ranges within the tuples.
Currently I have
a = [(0, 98, '122:R'),
(100, 210, '124:R'),
(180, 398, '125:R'),
(200, 298, '123:R')]
highNum = 0
highNumItem = ''
for item in a:
if item[0] < highNum:
print(highNumItem + ' overlaps ' + item[2])
if item[1] > highNum:
highNum = item[1]
highNumItem = item[2]
# 124:R overlaps 125:R
# 125:R overlaps 123:R
Which outputs enough information that overlaps should be able to be manually review and fixed. But, it misses identifying some sets of overlaps. I can't help thinking there's a relatively obvious solution I'm just missing or not using the right search terms to find examples of. But ideally I'd like the output to actually be
124:R overlaps 125:R & 123:R
125:R overlaps 123:R
But using my comparison method, I can't see a way to catch the rare instance where an overlap spans more than just 2 adjacent ranges. If anyone could point me to a function or comparison method appropriate to this, I'd greatly appreciate.
Also, if it matters, I'm currently stuck with python 2.7, but need to be able to port solution to 3.x when 3rd party applications allow it.
Upvotes: 6
Views: 1482
Reputation: 2331
This should work:
import operator
def get_overlaps(end, remaining):
output = []
for r in remaining:
if r[0] < end:
# starts before the end
output.append(r[2])
continue
break
return output
def get_all_overlaps(lst):
# thanks @Elan-R for this simplification
for i, (start, end, name) in enumerate(lst):
overlaps = get_overlaps(end, lst[i+1:])
if overlaps:
print(name, "overlaps", " & ".join(overlaps))
a = [(0, 98, '122:R'), (100, 210, '124:R'), (180, 398, '125:R'), (200, 298, '123:R')]
# sort by start time
a.sort(key=operator.itemgetter(0)) # thanks to @moonGoose
get_all_overlaps(a)
Output:
124:R overlaps 125:R & 123:R
125:R overlaps 123:R
This code iterates over each item in the list, and then checks every subsequent item to see if the start time is less than the end time of the current item. If so, it adds the name to the list of overlaps. If not, it stops checking for the current item, as the start times increase, so there will be no further overlaps.
(Tested for Python 3.6, but should work with any version)
Upvotes: 2
Reputation: 6613
Here is an example using intspan to calculate the overlaps. (Using Python 3.8)
from itertools import combinations
from intspan import intspan
a = [(0, 98, '122:R'), (100, 210, '124:R'), (180, 398, '125:R'), (200, 298, '123:R')]
d = {}
for one, two in combinations(a,2):
# if the 2 ranges intersect
if intspan.from_range(*one[0:2]) & intspan.from_range(*two[0:2]):
d.setdefault(one[2], []).append(two[2])
for key, v in d.items():
print(key + ',' + ','.join(v))
Prints:
124:R,125:R,123:R
125:R,123:R
Upvotes: 3
Reputation: 564
Checks the higher number of a pair with the lower number of another pair:
a = [(0, 98, '122:R'), (100, 210, '124:R'), (180, 398, '125:R'), (200, 298, '123:R')]
for i, base_data in enumerate(a):
for check_data in a[i + 1:]:
if base_data[1] > check_data[0]:
print(f"{base_data[2]} overlaps {check_data[2]}")
#prints:
#124:R overlaps 125:R
#124:R overlaps 123:R
#125:R overlaps 123:R
If you want it stored in groups:
from collections import defaultdict
a = [(0, 98, '122:R'), (100, 210, '124:R'), (180, 398, '125:R'), (200, 298, '123:R')]
d = defaultdict(list)
for i, base_data in enumerate(a):
for check_data in a[i + 1:]:
if base_data[1] > check_data[0]:
d[base_data[2]].append(check_data[2])
print(d)
#prints:
#defaultdict(<class 'list'>, {'124:R': ['125:R', '123:R'], '125:R': ['123:R']})
#but can *easily* be iterated over to pretty print:
print("\n".join([f'{key} overlaps {" & ".join(d[key])}' for key in d]))
#prints:
#124:R overlaps 125:R & 123:R
#125:R overlaps 123:R
The dictionary is far better than printing as it actually stores the data instead of just printing it. Printing the data can also be more controlled. Also, using a defaultdict
over a dict
makes the code more compact.
Upvotes: 3
Reputation: 13684
First, make sure you have sorted your list by interval start values. Then you just need to loop through them and group tuples together as long as their interval start value is lower than the end value of the first group item.
items = [(0, 98, '122:R'), (100, 210, '124:R'), (180, 398, '125:R'), (200, 298, '123:R')]
sorted(items, key=lambda x: x[0])
overlaps = []
while items:
overlap = list(takewhile(lambda item:item[0] < items[0][1],items))
overlaps.append(overlap)
items = items[len(overlap):]
Results in overlaps
being
[
[(0, 98, '122:R')],
[(100, 210, '124:R'), (180, 398, '125:R'), (200, 298, '123:R')]
]
Upvotes: 2
Reputation: 289
Here it is a solution in python 3.x
a = [(0, 98, '122:R'), (100, 210, '124:R'), (180, 398, '125:R'), (200, 298, '123:R')]
for i in range(len(a)-1):
i_low, i_high, i_id = a[i]
for j in range(i+1, len(a)):
j_low, j_high, j_id = a[j]
if i_low < j_low < i_high or j_low < i_low < j_high:
print(i_id, " overlaps with ", j_id)
In case python 2 didn't support that unpacking system:
a = [(0, 98, '122:R'), (100, 210, '124:R'), (180, 398, '125:R'), (200, 298, '123:R')]
for i in range(len(a)-1):
i_low, i_high, i_id = a[i][0], a[i][1], a[i][2]
for j in range(i+1, len(a)):
j_low, j_high, j_id =a[j][0], a[j][1], a[j][2]
if i_low < j_low < i_high or j_low < i_low < j_high:
print(i_id, " overlaps with ", j_id)
Upvotes: 2