Reputation: 1
I am using a KD-Tree to optimize Range searching on a set of 2D points (x,y)
.
To save time, I try to use Java Topology Suite's KD-Tree.
However, the javadoc states that:
Note that the structure of a KD-Tree depends on the order of insertion of the points. A tree may become imbalanced if the inserted points are coherent (e.g. monotonic in one or both dimensions). A perfectly balanced tree has depth of only log2(N), but an imbalanced tree may be much deeper. This has a serious impact on query efficiency
So my question is: How to insert points to a KD-tree in such a way that minimizes the height of the tree?
Upvotes: 0
Views: 331
Reputation: 1899
In wouldn't worry too much about it. It is a bit like hash maps which have in theory worst case O(n) lookup if all entries have to happen the same hash code. In reality this very unlikely to happen unless there is something wrong with the hash function.
To avoid imbalance, you should make sure that your points are not ordered in any way. If they are, consider shuffling them before inserting them.
Also, some kd-tree implementations have a rebalance()
function that can be called after inserting (the bulk of) the data. This will rebalance the tree internally.
Finally, if you really want to use a specific insertion order that avoids imbalance, you can do the following. Note that this approach is not optimal, it will avoid the worst case but will not typically result in a fully balanced tree: Sort the points by x-coordinate, then insert them using a binary split search. For example, if you have 15 points, sort them by 'x' to get a sorted list of point p0...p14. Then:
This results in the following insertion order:
Why is this not ideal?
Upvotes: 1