Reputation: 185
I'm looking for an implementation of a spatial index that allows me to quickly count and sum up the values that are contained in a specified region.
The longer version: I have a lot of objects that I want to store in a spatial index. They each have their coordinates in the n-dimensional space as well as one extra value. Given a range I need a quick answer to the questions of (1) how many objects are within the range and (2) what is the sum of all their values.
I know that a spatial index is usually implemented using R-trees. Of course I could simply retrieve all the objects within a range and sum them up each time.
However, it seems like there is a significant speed up opportunity by storing the sum and count of all elements contained under a node within that very node. Thus, once the node in question is entirely within the queried range, it is not necessary to descend the tree any further.
Does anyone know a C++ implementation that supports these kind of "cached" operations?
Upvotes: 3
Views: 626
Reputation: 2098
So what you're thinking about is an augumented R-tree. It's interesting though I'm guessing to benefit from this augumentation the query regions would have to be quite big WRT the bounding boxes of nodes and values stored in the R-tree. Otherwise the query would be forced to still always check leaf nodes (but there'd be an overhead - counters, additional checks).
Indeed as Justin R. said Boost.Geometry R-tree implementation doesn't store any counters in nodes, allow to define additional data stored in nodes or user-defined queries, at least for now (Boost 1.57).
However it'd be possible to optimize this counting query. It's not required to return any Values, create and fill a temporary container, etc. Instead values could be counted on-the-fly during the query, e.g. like this in C++11:
size_t counter = 0;
rtree.query(bgi::intersects(box),
boost::make_function_output_iterator(
[&](Value const&) {
counter++;
}));
Upvotes: 1
Reputation: 24061
Boost has a good R-tree implmentation, though I don't think the functionality that you're looking for is built-in.
One approach would be to modify your node's data type to include an additional field to represent the subtree metadata (child count and subtree sum), or make the node a tuple of your current type and the metadata. Whenever you add, edit or delete a node, those functions would call an update function that would walk up chain of parent nodes, incrementing or decrementing the metadata.
I suspect that if you are going to bullk-loading the data, this is even easier, as you can do it in just two passes, one to go through and calculate the metadata for each node, and then do a series of inserts that don't perform the update function.
If you're not going to bulk-load, another common spatial index is the quadtree. This data structure is often better suited to spatial data that is frequently updated, as it doesn't need to re-balance all the time. I use quadtrees more than R-trees, and find them super flexible.
Upvotes: 3