cosmictypist
cosmictypist

Reputation: 575

Finding nearest set of numbers given a position

I have a dictionary that looks something like so:

exons = {'NM_015665': [(0, 225), (356, 441), (563, 645), (793, 861)], etc...}

and another file that has a position like so:

isoform    pos    
NM_015665    449

What I want to do is print the range of numbers that the position in the file is the closest to and then print the number within that range of numbers that the value is closest to. For this case, I want to print (356, 441) and then 441. I've successfully figured out a way to print the number in the set of numbers that the value is closest to, but my code below only takes into account 10 values on either side of the numbers listed. Is there any way to take into account that there are a different amount of numbers between each set of ranges?

This is the code I have so far:

with open('splicing_reinitialized.txt') as f:
    reader = csv.DictReader(f,delimiter="\t")
    for row in reader:
        pos = row['pos']
        name = row['isoform']
        ppos1 = int(pos)
        if name in exons:
            y = exons[name]
            for i, (low,high) in enumerate(exons[name]):
                if low -5 <= ppos1 <= high + 5:
                    values = (low,high)
                    closest = min((low,high), key = lambda x:abs(x-ppos1))

Upvotes: 1

Views: 87

Answers (2)

Eli Korvigo
Eli Korvigo

Reputation: 10503

A functional alternative :). This might even run faster. It clearly is very RAM-friendly and can be easily parallelized due to the perks of functional programming. I hope you'll find it interesting enough to study.

from itertools import imap, izip, ifilter, repeat


def closest_point(position, interval):
    """:rtype: tuple[int, int]"""  # closest interval point, distance to it
    position_in_interval = interval[0] <= position <= interval[1]
    closest = min([(border, abs(position - border)) for border in interval], key=lambda x: x[1])
    return closest if not position_in_interval else (closest[0], 0)  # distance is 0 if position is inside an interval


def closest_interval(exons, pos):
    """:rtype: tuple[tuple[int, int], tuple[int, int]]"""
    return min(ifilter(lambda x: x[1][1], izip(exons, imap(closest_point, repeat(pos, len(exons)), exons))), 
               key=lambda x: x[1][1])


print(closest_interval(exons['NM_015665'], 449))

This prints

((356, 441), (441, 8))

The first tuple is a range. The first integer in the second tuple is the closest point in the interval, the second integer is the distance.

Upvotes: 1

Imanol Luengo
Imanol Luengo

Reputation: 15889

I would rewrite it as a minimum distance search:

if name in exons:
    y = exons[name]
    minDist = 99999 # large number
    minIdx = None
    minNum = None
    for i, (low,high) in enumerate(y):
        dlow = abs(low - ppos1)
        dhigh = abs(high - ppos1)
        dist = min(dlow, dhigh)
        if dist < minDist:
            minDist = dist
            minIdx = i
            minNum = 0 if dlow < dhigh else 1
    print(y[minIdx])
    print(y[minIdx][minNum])

This ignores the search range, just search for the minimum distance pair.

Upvotes: 1

Related Questions