Reputation: 87
I have a list of points with coordinates in the form point = (lat,lon). The list can contain several 1000 points. In my old, simple implementation. I am doing this:
def points_in_range(point1,list_of_pts,tolerance):
"""
takes one coordinate, a list of points and the tolerance and
returns a list of indexes of points within range/tolerance from the coordinate.
"""
return [i for i,point2 in enumerate(list_of_pts) if haversine(point1,point2)<= tolerance]
where haversine(lat,lon) is the haversine function.
This is linear in time and we can clearly do better. By sorting the points in the list on latitude and longitude, I thing one could do the same operation in a fraction of this time, because usually only <1% of points actually fits the criteria. By storing the data in a smart way - I could maybe just look at 5% of the points or even less.
My first thought was to do a simple sort on lat and then in each iteration compute max and min latitude, bisect the list wrt to these values and then run points_in_range() on this much smaller list. I could also do a bisect on this smaller list, but I would first have to sort it on lon, so just going for points_in_range() is actually better in terms of big O.
The second idea would be to discretize the whole coordinate system into a 2d array, but that seems very awkward to me.
Does anyone see a good data structure I could use? Thanx.
Upvotes: 1
Views: 1660
Reputation: 899
Have a look at m-tree. There are also many other spacial indexes: http://en.wikipedia.org/wiki/Spatial_database Initially, you construct the data structure (the index) and later just perform a range query.
From the wiki page for m-trees:
For a given query object Q ∈ D and a maximum search distance r(Q), the range query range(Q, r(Q)) selects all the indexed objects Oj such that d(Oj, Q) ≤ r(Q).[2]
The wikipedia page for the m-tree also has the algorithm for range queries.
You can perform range queries in sub linear time. Also this works only if the distance measure you are using obeys trianlge inequality. If haversine (I have never head of it before), obeys triangle inequality, this should work for you.
Upvotes: 2