Reputation: 1575
So when balancing a KD tree you're supposed to find the median and then put all the elements that are less on the left subtree and those greater on the right. But what happens if you have multiple elements with the same value as the median? Do they go in the left subtree, the right or do you discard them?
I ask because I've tried doing multiple things and it affects the results of my nearest neighbor search algorithm and there are some cases where all the elements for a given section of the tree will all have the exact same value and so I don't know how to split them up in that case.
Upvotes: 3
Views: 3144
Reputation: 894
That depends on your purpose.
For problems such as exact-matching or range search, possibility of repetitions of the same value on both sides will complicate the query and repetition of the same value on both leaves will add to the time-complexity.
A solution is storing all of the medians (the values that are equal to the value of median) on the node, neither left nor right. Most variants of kd-trees store the medians on the internal nodes. If they happen to be many, you may consider utilizing another (k-1)d tree for the medians.
Upvotes: 0
Reputation: 77454
It does not really matter where you put them. Preferably, keep your tree balanced. So place as many on the left as needed to keep the optimal balance!
If your current search radius touches the median, you will have to check the other part, that's all you need to handle tied objects on the other side. This is usually cheaper than some complex handling of attaching multiple elements anywhere.
Upvotes: 5
Reputation: 275385
When doing a search style algorithm, it is often a good idea to put elements equal to your median on both sides of the median.
One method is to put median equaling elements on the "same side" as where they where before you did your partition. Another method is to put the first one on the left, and the second one on the right, etc.
Another solution is to have a clumping data structure that just "counts" things that are equal instead of storing each one individually. (if they have extra state, then you can store that extra state instead of just a count)
I don't know which is appropriate in your situation.
Upvotes: 2