Filter a list of integer ranges by both a maximum overlap percentage and value

Question

I have a list of ranges such as:

[12-48,40-80,60-105,110-130,75-400]

And I need to filter out or remove the ranges which overlap more than x digits (So overlap more than 10 for example) and/or overlap more than x% (Lets say 20%) of the smallest of the compared ranges.

At the moment I use a for loop to check each range at a time and compare it to the next to see whether they overlap past my stated limits, and if so, remove it. This does not work as with in the example I show I get this result:

[12-48,75-400]

The range [40-80] should not have been removed because it does not overlap with our 2 remaining ranges past the limits but because it overlapped [60-105] and was the smaller of the 2, it was removed. The correct remaining ranges should be:

[12-48,40-80,75-400]

I do not think a simple for loop is the solution here but I am at a loss. Please let me know if anything is unclear.

Current Code

The parts with GeneA/GenePrev/GeneAND are how I calculate the % overlap and can be ignored.

        start = int(key.split(',')[0])
        stop = int(key.split(',')[1])
        length = stop - start
        if First == True:
            Both_Frames[key] = value
            First = False
            GeneA[start:stop] = [1] * (stop - start)
            GenePrev = GeneA
            PrevStart = start
            PrevStop = stop
            prevlength = PrevStop - PrevStart
        else:
            GeneA[start:stop] = [1] * (stop - start)
            Gene_AND = GenePrev & GeneA

            if start == PrevStart:
                GenePrev = GeneA
                
                ######Need to delete item from dictionary which is overlapping
                Both_Frames.popitem(last=False)
                Both_Frames[key] = value
                PrevStart = start
                PrevStop = stop
                prevlength = PrevStop - PrevStart
            elif start >= PrevStart and stop <= PrevStop:
           
                continue
            elif  np.count_nonzero(Gene_AND) <= (length * OverLapPercentage) and np.count_nonzero(Gene_AND) <= OverLapNT:
                GenePrev = GeneA
                Both_Frames[key] = value
                PrevStart = start
                PrevStop = stop
                prevlength = PrevStop - PrevStart

            elif np.count_nonzero(Gene_AND) >= (length * OverLapPercentage) or np.count_nonzero(Gene_AND) >= OverLapNT:
                if length > prevlength:
                    GenePrev = GeneA
                 
                    Both_Frames.popitem(last=False)
                    Both_Frames[key] = value
                    PrevStart = start
                    PrevStop = stop
                    prevlength = PrevStop - PrevStart

Filter a list of integer ranges by both a maximum overlap percentage and value

Current Code

Answers (1)

Related Questions