Reputation: 4216
I've coded up a KD-Tree in Java using the "median of list" algorithm for constructing a more balanced tree. It seems to work fine when using the data provided by the wiki, note that the wikipedia example uses only X,Y values, so it doesn't evaluate the Z depth.
From wikipedia:
point_list = [(2,3), (5,4), (9,6), (4,7), (8,1), (7,2)]
From my java program:
depth=0 id=(7.0, 2.0, 0.0)
├── [left] depth=1 id=(5.0, 4.0, 0.0)
│ ├── [left] depth=2 id=(2.0, 3.0, 0.0)
│ └── [right] depth=2 id=(4.0, 7.0, 0.0)
└── [right] depth=1 id=(9.0, 6.0, 0.0)
└── [left] depth=2 id=(8.0, 1.0, 0.0)
But when I use the "median of list" approach on this data, it doesn't seem to work properly.
point list = [(1,0,-1), (1,0,-2), (1,0,1), (1,0,2)]
I get a tree like this:
depth=0 id=(1.0, 0.0, 1.0)
├── [left] depth=1 id=(1.0, 0.0, -2.0)
│ └── [left] depth=2 id=(1.0, 0.0, -1.0)
└── [right] depth=1 id=(1.0, 0.0, 2.0)
Which doesn't look correct because (1.0, 0.0, 2.0) is to the right of (1.0, 0.0, 1.0) but they are essentially equal because their Y values are equal. Also, (1.0, 0.0, -1.0) is to the left of (1.0, 0.0, -2.0) and it should be to the right since it's Z value is greater.
I think the problem stems from having equal X and Y values and only variable Z values, so the median of the list doesn't really split the list accurately.
... original code following the wiki's python code ...
private static KdNode createNode(List<XYZPoint> list, int k, int depth) {
if (list == null || list.size() == 0) return null;
int axis = depth % k;
if (axis == X_AXIS) Collections.sort(list, X_COMPARATOR);
else if (axis == Y_AXIS) Collections.sort(list, Y_COMPARATOR);
else Collections.sort(list, Z_COMPARATOR);
KdNode node = null;
if (list.size() > 0) {
int mediaIndex = list.size() / 2;
node = new KdNode(k, depth, list.get(mediaIndex));
if ((mediaIndex - 1) >= 0) {
List<XYZPoint> less = list.subList(0, mediaIndex);
if (less.size() > 0) {
node.lesser = createNode(less, k, depth + 1);
node.lesser.parent = node;
}
}
if ((mediaIndex + 1) <= (list.size() - 1)) {
List<XYZPoint> more = list.subList(mediaIndex + 1, list.size());
if (more.size() > 0) {
node.greater = createNode(more, k, depth + 1);
node.greater.parent = node;
}
}
}
return node;
}
... new code based on my comment ...
private static KdNode createNode(List<XYZPoint> list, int k, int depth) {
if (list == null || list.size() == 0) return null;
int axis = depth % k;
if (axis == X_AXIS) Collections.sort(list, X_COMPARATOR);
else if (axis == Y_AXIS) Collections.sort(list, Y_COMPARATOR);
else Collections.sort(list, Z_COMPARATOR);
KdNode node = null;
if (list.size() > 0) {
int medianIndex = list.size() / 2;
node = new KdNode(k, depth, list.get(medianIndex));
List<XYZPoint> less = new ArrayList<XYZPoint>(list.size()-1);
List<XYZPoint> more = new ArrayList<XYZPoint>(list.size()-1);
//Process list to see where each non-median point lies
for (int i=0; i<list.size(); i++) {
if (i==medianIndex) continue;
XYZPoint p = list.get(i);
if (KdNode.compareTo(depth, k, p, node.id)<=0) {
less.add(p);
} else {
more.add(p);
}
}
if (less.size() > 0) {
node.lesser = createNode(less, k, depth + 1);
node.lesser.parent = node;
}
if (more.size() > 0) {
node.greater = createNode(more, k, depth + 1);
node.greater.parent = node;
}
}
Upvotes: 1
Views: 2574
Reputation: 1627
I think the essential question at this point is: what, exactly, do you want to do with the KD-tree?
Upvotes: 0
Reputation: 6257
The problem indeed has to do with equal coordinates and arises from how you split the nodes into less
and more
parts. Since you have the median index, why not use the index for splitting instead of checking the coordinates? Just change the condition in createNode
on line 116 from
if (KdNode.compareTo(depth, k, p, node.id)<=0) {
to
if (i<medianIndex) {
Btw: there are more efficient algorithms to partition a list into lower, median, upper than sorting. (lower and upper parts do not need to be sorted! see e.g. the implementation of std::nth_element
in the C++ stdlib - sorry, I'm that much into Java programming)
Upvotes: 2