gansub
gansub

Reputation: 1174

Performing a geohash in memory range query

At the outset I would like to say I am not interested in using redis or any other spatial DB. I am trying to do a very simplistic in memory geohash range query and I am using the following software to calculate geohash- geohash-int C package and I have a Cython wrapper to call these APIs in Python 3.6. I am using a SortedList to store the geohashes and my goal is to do a simple geohash range query in memory.

#GeoHash is a Cython wrapper of external C geohash library (link provided)
from geo import GeoHash
from sortedcontainers import SortedList
import numpy as np

import time
minLat = 27.401436
maxLat = 62.54858
minLo = -180.0
maxLo = 179.95000000000002
latGrid = np.arange(minLat,maxLat,0.05)
lonGrid = np.arange(minLo,maxLo,0.05)
geoHash = GeoHash()

print(latGrid.shape,lonGrid.shape)
gridLon,gridLat = np.meshgrid(lonGrid,latGrid)
grid_points = np.c_[gridLon.ravel(),gridLat.ravel()]

sl = SortedList()
geohash1 = {}
t0 = time.time()
for grid_point in grid_points:
   lon = grid_point[0]
   lat = grid_point[1]
   geohash = geoHash.encode(lon,lat,26)
   bitsOriginal = geohash["bits"]
   sl.add(bitsOriginal)
   neighbors = geoHash.get_neighbors(geohash)
   for k,v in neighbors.items():
        bits1 = v["bits"]
        sl.add(bits1)
t1 = time.time()
print(t1-t0)
lonTest = 172.76843
latTest = 61.560745
geohashTest = geoHash.encode(lonTest,latTest,26)
bitsTest =    geohashTest["bits"]

What I want to do is the following

it = sl.irange(bitsTest-window,bitsTest+window)

My question is how do I about calculating the window ? I want the window to be within 0.1 degrees or whatever window I specify. I have no idea on how to calculate the window. The whole geohash package is very fast and I am only interested in approximate matches for my query. I believe my test point should be within "range" of the input data set for which I have calculated geohashes but I have no idea on how to obtain the range of the geohashes for my query point. Can somebody help ?

UPDATE The proposed answer is fine but has a complexity of O( N).If there is a complexity of the order of O(log N) that would be acceptable.

Upvotes: 0

Views: 1344

Answers (2)

GrantJ
GrantJ

Reputation: 8699

Geohashes are designed so that two locations that are near one another will have a similar prefix/value. Wikipedia describes the algorithm with an example. As I understand it, the latitude and longitude are converted to binary values and the bits are interleaved with one another. For example:

In [33]: def geohash(lat, lng):
    ...:     "Approximate geohash algorithm."
    ...:     # Step 1: Convert to fixed-point.
    ...:     # I'm going to support six decimal places.
    ...:     lat = int(lat * 1e6)
    ...:     lng = int(lng * 1e6)
    ...:     # Step 2: Convert integers to 32-bit binary.
    ...:     lat = format(lat, '032b')
    ...:     lng = format(lng, '032b')
    ...:     # Step 3: Interleave bits from lat and lng.
    ...:     bits = [bit for pair in zip(lat, lng) for bit in pair]
    ...:     # Step 4: Convert bits to 64-bit integer.
    ...:     return int(''.join(bits), 2)
    ...: 
    ...: 

In [34]: lat, lng = 37.7749, 122.4194  # San Francisco, CA

In [35]: geohash(lat, lng)
Out[35]: 8215849339476576

If you modify the latitude and longitude only a bit then the number doesn't change much. You can create a bounding box by adding and subtracting from both latitude and longitude:

In [38]: sf = geohash(lat, lng)

In [39]: lower_bounds = geohash(lat-0.001, lng-0.001)

In [40]: upper_bounds = geohash(lat+0.001, lng+0.001)

In [41]: lower_bounds < sf < upper_bounds
Out[41]: True

Now with lower and upper bounds you can use SortedList.irange to find all points near a given latitude and longitude in O(log(n)) time.

Upvotes: 1

Jilles van Gurp
Jilles van Gurp

Reputation: 8294

Sounds like it should be possible. You are looking for 0.1 degree precision. Of course how much that is in meters varies to where you are on the planet and whether we are talking longitude or latitude. But you can calculate that. Based on that you can figure out what the minimum prefix of your gehash needs to be for it's rectangular shape to cover that. Longer hashes with the same prefix are contained in the rectangle that the smaller prefix describes.

For finer granularity, use multiple slightly longer rectangles. This also helps you cover cases where whatever range you are looking at crosses the edge of your rectangle.

Then if you were to generate a set of geohashes of sufficient length that exactly covers a circle with an origin with the range you are looking for, the range query becomes a matter of figuring out if the geohash of your coordinate shares a long enough prefix with that set of geohashes.

You might want to check out my https://github.com/jillesvangurp/geogeometry library. It has algorithms and functions for all of the above. You can do circles, bounding boxes, or polygons and cover those with geohashes of a given max length. You can calculate what an appropriate value is for that max length with another function.

It's java based but it should port easily to python or whatever else you want given how I structured it. Mostly it's just loops and simple math using doubles.

I actually used this to implement a simple geospatial search engine six years ago. Scales quite well if you have a db or search engine that can handle tens of millions of geheohashes. For smaller datasets, you can easily do this in memory.

Upvotes: 1

Related Questions