Reputation: 107
I'm trying to put together a 2D KD-tree implementation. At this point it works but the run time explodes for more than ~100k points. It takes 15s for 100k and about 30 mins for 1e6. At first I thought the bottleneck was the sorting to find median values, but it seems to be with the subList and addAll methods. Any suggestions for improvements would be great.
Thanks,
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;
import java.util.Random;
public class KDtree {
//****************************************************
//setting up a data set for input
//****************************************************
public kdLite() {
long startTime = System.currentTimeMillis() / 1000;
//select random values to generate data set
double[][] dataSet = new double[2][100000];
for (int i = 0; i < 100000; i++) {
dataSet[0][i] = (Math.random() * (99));
dataSet[1][i] = (Math.random() * (99));
//System.out.print(dataSet[0][i] + "\t" + dataSet[1][i] + "\n");
}
//System.out.print("\n");
//setup a point class for simple data manipulation and add data to it
ArrayList<Point> preSorted = new ArrayList<Point>();
for (int i = 0; i < dataSet[0].length; i++) {
Point point = new Point(i, dataSet[0][i], dataSet[1][i], 0);
preSorted.add(point);
}
//split and sort the list
ArrayList<Point> outList = splitList(preSorted);
// add the list to the binary tree structure
BinaryST buildKD = new BinaryST();
for (int i = 0; i < outList.size(); i++) {
buildKD.insertNode(outList.get(i));
}
long endTime = System.currentTimeMillis() / 1000;
System.out.println((int) (endTime - startTime) / 60 + " Minutes and " + (endTime - startTime) + " Seconds");
// buildKD.printTree();
//****************************************************
}
//****************************************************
//the brunt of the code. this method takes a list of Point objects
//solves for the axis to split on and cuts the list into 2^i segments
//****************************************************
public ArrayList<Point> splitList(ArrayList<Point> arrToSplit) {
ArrayList<ArrayList<Point>> splitList = new ArrayList<ArrayList<Point>>();
ArrayList<Point> Meds = new ArrayList<Point>();
int axis = 0;
int toSplit = 0;
double maxXdif = 0;
double maxYdif = 0;
//populate first bucket
splitList.add(new ArrayList<Point>());
for (int i = 0; i < arrToSplit.size(); i++) {
splitList.get(0).add(arrToSplit.get(i));
}
for (int slice = 0; slice < arrToSplit.size(); slice++) {
//get first bucket that has more than one value then use it first
for (int i = 0; i < splitList.size(); i++) {
if (splitList.get(i).size() >= 1) {
toSplit = i;
if (splitList.get(i).size() > 1) {
break;
}
}
}
if (splitList.get(toSplit).size() > 1) {
sortByX(splitList.get(toSplit));
maxXdif = Math.abs(splitList.get(toSplit).get(0).x - splitList.get(toSplit).get(splitList.get(toSplit).size() - 1).x);
sortByY(splitList.get(toSplit));
maxYdif = Math.abs(splitList.get(toSplit).get(0).y - splitList.get(toSplit).get(splitList.get(toSplit).size() - 1).y);
//arrange by splitting axis according to largest distance to find splitting axis
if (maxXdif > maxYdif) {
axis = 0;
sortByX(splitList.get(toSplit));
} else {
axis = 1;
sortByY(splitList.get(toSplit));
}
//solve for median point .. arbitrate if no point lies on axis (uneven split)
int Med = (int) Math.floor(splitList.get(toSplit).size() / 2);
//take median point, assign splitting axis
splitList.get(toSplit).get(Med).axis = axis;
Meds.add(splitList.get(toSplit).get(Med));
splitList.get(toSplit).remove(Med);
---- >>>>>> PROBLEM CODE
// relocate all points except median to new list, delete the median value
List<Point> head = splitList.get(toSplit).subList(Med, splitList.get(toSplit).size());
splitList.add(new ArrayList<Point>());
splitList.get(splitList.size() - 1).addAll(head);
head.clear();
splitList.get(toSplit).subList(Med - 1, splitList.get(toSplit).size() - 1).clear();
} else {
//these are the leftover points so ordering is arbitrary
//randomize axis to ensure balance
Random random = new Random();
int randomAxis = random.nextInt(2 - 0);
Meds.add(splitList.get(toSplit).get(0));
splitList.get(toSplit).get(0).axis = randomAxis;
splitList.remove(toSplit);
}
}
return Meds;
}
//****************************************************
//****************************************************
//sorting methods for sorting a list by x or y
//must use comparator to sort by custom object attributes
//****************************************************
private ArrayList<Point> sortByX(ArrayList<Point> xList) {
Collections.sort(xList, new Comparator<Point>() {
public int compare(Point p1, Point p2) {
return Double.compare(p1.getX(), p2.getX());
}
});
return xList;
}
private ArrayList<Point> sortByY(ArrayList<Point> yList) {
Collections.sort(yList, new Comparator<Point>() {
public int compare(Point p1, Point p2) {
return Double.compare(p1.getY(), p2.getY());
}
});
return yList;
}
//****************************************************
}
Upvotes: 0
Views: 308
Reputation: 1661
There is a sortByX() and sortByY() call inside splitList() function and the parameter they are taking is not related each other's result. So I think.. as long as your CPU power has some extra resources, maybe you can make those two calculation to run in different Thread and use it when it's done.
Setting initial ArrayList capacity when you create ArrayList is good idea as well. It has default 32 or so and what happened when ArrayList is filled out is.. it creates new internal array with double size than original one, and copy existing items of internal items into new one. It's OK for small length of array, but can be problematic in case like you.
IIRC, there are some implementation difference so that performance is as well for subList(), so if you ran the test with Java6, just try with Java7.
Upvotes: 0
Reputation: 955
Use this:
ArrayList<Point>(int capacity);
Because a new ArrayList is created by default with a capacity of 10 element. It doubles the current capacity everytime it reaches its size by creating a new array and the old one is destroyed by the garbage collector. So in your current case your ArrayList capacity is 10->20->40->80->160->...
Upvotes: 1