user2993631
user2993631

Reputation: 13

Finding nearest neighbor(s) in a KD Tree

Warning: Fairly long question, perhaps too long. If so, I apologize.

I'm working on a program involving a nearest neighbor(s) search of a kd tree (in this example, it is an 11 dimensional tree with 3961 individual points). We've only just learned about them, and while I have a good grasp of what the tree does, I get very confused when it comes to the nearest neighbor search.

I've set up a 2D array of points, each containing a quality and a location, which looks like this.

struct point{
  double quality;
  double location;
}

// in main
point **parray; 
// later points to an array of [3961][11] points

I then translated the data so it has zero mean, and rescaled it for unit variance. I won't post the code as it's not important to my questions. Afterwards, I built the points into the tree in random order like this:

struct Node {
  point *key;
  Node *left;
  Node *right;
  Node (point *k) { key = k; left = right = NULL; }
};

Node *kd = NULL;
// Build the data into a kd-tree
random_shuffle(parray, &parray[n]);
for(int val=0; val<n; val++) {
  for(int dim=1; dim<D+1; dim++) {
    kd = insert(kd, &parray[val][dim], dim);
  }
}

Pretty standard stuff. If I've used random_shuffle() incorrectly, or if anything is inherently wrong about the structure of my tree, please let me know. It should shuffle the first dimension of the parray, while leaving the 11 dimensions of each in order and untouched.

Now I'm on to the neighbor() function, and here's where I've gotten confused.

The neighbor() function (last half is pseudocode, where I frankly have no idea where to start):

Node *neighbor (Node *root, point *pn, int d, 
                Node *best, double bestdist) {
  double dist = 0;
  // Recursively move down tree, ignore the node we are comparing to
  if(!root || root->key == pn) return NULL;

  // Dist = SQRT of the SUMS of SQUARED DIFFERENCES of qualities
  for(int dim=1; dim<D+1; dim++) 
    dist += pow(pn[d].quality - root->key->quality, 2);
  dist = sqrt(dist);

  // If T is better than current best, current best = T
  if(!best || dist<bestdist) {
    bestdist = dist;
    best = root;
  }

  // If the dist doesn't reach a plane, prune search, walk back up tree
  // Else traverse down that tree

  // Process root node, return
}

Here's the call to neighbor in main(), mostly uncompleted. I'm not sure what should be in main() and what should be in the neighbor() function:

// Nearest neighbor(s) search
double avgdist = 0.0;
// For each neighbor
for(int i=0; i<n; i++) {
  // Should this be an array/tree of x best neighbors to keep track of them?
  Node *best;
  double bestdist = 1000000000;
  // Find nearest neighbor(s)? 
  for(int i=0; i<nbrs; i++) {
    neighbor(kd, parray[n], 1, best, &bestdist);
  }
  // Determine "distance" between the two?
  // Add to total dist?
  avgdist += bestdist;
}
// Average the total dist
//  avgdist /= n;

As you can see, I'm stuck on these last two sections of code. I've been wracking my brain over this for a few days now, and I'm still stuck. It's due very soon, so of course any and all help is appreciated. Thanks in advance.

Upvotes: 0

Views: 1537

Answers (1)

Has QUIT--Anony-Mousse
Has QUIT--Anony-Mousse

Reputation: 77505

The kd-tree does not involve shuffling.

In fact, you will want to use sorting (or better, quickselect) to build the tree.

First solve it for the nearest neighbor (1NN). It should be fairly clear how to find the kNN once you have this part working, by keeping a heap of the top candidates, and using the kth point for pruning.

Upvotes: 1

Related Questions