Reputation: 255
I am looking to write a data structure to represent some genetic data. This data can be represented as a list of size n
, where each entry also has a "genetic position" which is a real number between 0 and 1. To make nomenclature clear, I'll call the position in the list id
and the genetic position gpos
. The way I implemented this is as a class with
class Coords(object):
def __init__(self, *args, **kwargs):
self.f = list(*args, **kwargs)
self.r = dict()
for i,e in enumerate(self.f):
self.r[e] = i
def __setitem__(self,x,y):
self.f.__setitem__(x,y)
self.r.__setitem__(y,x)
def __getitem__(self,x):
return self.f.__getitem__(x)
def __len__(self):
return self.f.__len__()
now, I have two issues with this. The first one is that the indeces of self.r are floats, which is obviously a bad idea. I was thinking of converting them to strings (with a fixed number of digits), but is there a better idea? The other issue I have is that I want to implement accessing entries via gpos
, so if I, for example, would like to access everything between gpos
0.2 and 0.4, I would like to be able to do that using
import numpy as np
Coords(np.arange(1,0,-.1))
c.r[0.2:0.4]
is there an easy way to define that? I was thinking of finding the correct id
of the starting and ending positions using binary search and then access self.f
using these ids, but is there a way to implement above syntax?
Upvotes: 4
Views: 5651
Reputation: 7592
If you know that your gpos
values will (or can) always be stored in sorted order, then I'd definitely recommend using binary search for this task. You can take advantage of the array syntax and numpy
's builtin implementation with searchsorted:
>>> gpos_vals = np.linspace(0, 1, 11)
>>> gpos_vals
array([ 0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])
>>> lo, hi = gpos_vals.searchsorted([0.22, 0.52])
>>> lo, hi
(3, 6)
>>> gpos_vals[lo:hi]
array([ 0.3, 0.4, 0.5])
I think this nicely avoids the issues that you pointed out regarding using floats for dictionary keys, which could be problematic.
You could also combine this answer with Jaime's and make a class that looks for a slice inside a custom __getitem__
, and then passes the slice parameters to searchsorted
as in my snippet:
class GeneticPositions(object):
def __init__(self, gpos_values):
self.gpos_values = np.asarray(gpos_values)
def __getitem__(self, x):
if isinstance(x, slice):
lo, hi = self.gpos_values.searchsorted(
[x.start or 0, x.stop or 1])
return self.gpos_values[lo:hi]
return self.gpos_values[x]
Upvotes: 3
Reputation: 67457
When you index an object with a slice, Python creates a slice
object with the inputs you provide. For example, if you do c[0.2:0.4]
, then the argument passed to c.__getitem__
will be slice(0.2, 0.4)
. So you could have something like this code in your __getitem__
method:
def __getitem__(self, x):
if isinstance(x, slice):
start = x.start
stop = x.stop
step = x.step
# Do whatever you want to do to define your return
...
If you want to use this fancy indexing not on the Coords
object, but in the self.r
dictionary, I think the easiest would be to create a FancyIndexDict
that is a subclass of dict
, modify its __getitem__
method, and then have self.r
be a FancyIndexDict
, not a dict
.
Upvotes: 5