Reputation: 9286
The goal is to implement efficient geospatial data structure and queries. More precisely, "get all items within the given bounding rectangle". Bounding rectangle would be defined with longitudeMin
, longitudeMax
, latitudeMin
and latitudeMax
.
So the DynamoDB query I had in mind would go like:
KeyConditionExpression:
itemLongitude BETWEEN :longitudeMin AND :longitudeMax
and
itemLatitude BETWEEN :latitudeMin AND :latitudeMax
..where itemLongitude
and itemLatitude
would be sort keys for the queried table.
However, based on the DynamoDB documentation, KeyConditionExpression
, accepts only one sort key. Am I understanding things right?
While I'm aware of the Geo Library project, before digging into that completely unfamiliar paradigm to me, I wanna know how far exactly can I go utilizing the core DynamoDB features.
Upvotes: 3
Views: 2827
Reputation: 9315
You are right that DynamoDB doesn't support queries with multiple non-EQ conditions - you may query the partition key only for equality while the sort key can be compared using operators such as BETWEEN
, <
, >
etc. So in order to query for let's say
742 <= x <= 1082
113 <= y <= 305
the best you could do with a normal composite key would likely be to partition your data in groups to minimize the number of queries needed, but you would never be able to find all points within the given box with one single query, and you would need to do some client-side filtering as well, which consumes read capacity units.
As an example for the data above, we could store floor(x / 100)
(or, if you will, the n
first digits of the zero-left-padded x value) as the hash key, and use the y coordinate as the sort key. A point [1033; 278]
would then be encoded as
hash x y
10 1033 278
The example above could then be queried using:
Query (hash = 07, y BETWEEN 113 AND 305) + Filter x >= 742
Query (hash = 08, y BETWEEN 113 AND 305)
Query (hash = 09, y BETWEEN 113 AND 305)
Query (hash = 10, y BETWEEN 113 AND 305) + Filter x <= 1082
While this works, large boxes would require many queries. The client would also need to filter and merge the resulting data into one data set.
A better approach is usually to restructure the data. In the case of geo-coordinates, a common solution is to use geo hashes, which is effectively a way to encode a coordinate pair into one value so that points in proximity to each other are likely to share a common prefix. A geo hash can then be compared as a string to find points within a certain area.
Since much of this is supported by the DynamDB Geo library which you already mentioned, I would suggest using it to simplify the management of geo hashes and other coordinate related conversions.
Upvotes: 4