Reputation: 575
I have a dictionary that looks something like so:
exons = {'NM_015665': [(0, 225), (356, 441), (563, 645), (793, 861)], etc...}
and another file that has a position like so:
isoform pos
NM_015665 449
What I want to do is print the range of numbers that the position in the file is the closest to and then print the number within that range of numbers that the value is closest to. For this case, I want to print (356, 441)
and then 441
. I've successfully figured out a way to print the number in the set of numbers that the value is closest to, but my code below only takes into account 10 values on either side of the numbers listed. Is there any way to take into account that there are a different amount of numbers between each set of ranges?
This is the code I have so far:
with open('splicing_reinitialized.txt') as f:
reader = csv.DictReader(f,delimiter="\t")
for row in reader:
pos = row['pos']
name = row['isoform']
ppos1 = int(pos)
if name in exons:
y = exons[name]
for i, (low,high) in enumerate(exons[name]):
if low -5 <= ppos1 <= high + 5:
values = (low,high)
closest = min((low,high), key = lambda x:abs(x-ppos1))
Upvotes: 1
Views: 87
Reputation: 10503
A functional alternative :). This might even run faster. It clearly is very RAM-friendly and can be easily parallelized due to the perks of functional programming. I hope you'll find it interesting enough to study.
from itertools import imap, izip, ifilter, repeat
def closest_point(position, interval):
""":rtype: tuple[int, int]""" # closest interval point, distance to it
position_in_interval = interval[0] <= position <= interval[1]
closest = min([(border, abs(position - border)) for border in interval], key=lambda x: x[1])
return closest if not position_in_interval else (closest[0], 0) # distance is 0 if position is inside an interval
def closest_interval(exons, pos):
""":rtype: tuple[tuple[int, int], tuple[int, int]]"""
return min(ifilter(lambda x: x[1][1], izip(exons, imap(closest_point, repeat(pos, len(exons)), exons))),
key=lambda x: x[1][1])
print(closest_interval(exons['NM_015665'], 449))
This prints
((356, 441), (441, 8))
The first tuple is a range. The first integer in the second tuple is the closest point in the interval, the second integer is the distance.
Upvotes: 1
Reputation: 15889
I would rewrite it as a minimum distance search:
if name in exons:
y = exons[name]
minDist = 99999 # large number
minIdx = None
minNum = None
for i, (low,high) in enumerate(y):
dlow = abs(low - ppos1)
dhigh = abs(high - ppos1)
dist = min(dlow, dhigh)
if dist < minDist:
minDist = dist
minIdx = i
minNum = 0 if dlow < dhigh else 1
print(y[minIdx])
print(y[minIdx][minNum])
This ignores the search range, just search for the minimum distance pair.
Upvotes: 1