Martin Drozdik
Martin Drozdik

Reputation: 13303

Are there any nearest neighbor data structures using a user provided hint?

I am looking for a data structure holding vectors in R^n which could perform nearest neighbor queries using a user provided hint as to which vector is likely to be close to the query. For example:

class NearestNeighborStructure;
...
NearestNeighborStructure structure;
Vector vec1 = {1,0,0};
Vector vec2 = {1,1,0};
Vector vec3 = {1,1,1};
structure.insert(vec1);
structure.insert(vec2);
structure.insert(vec3);

Now let us suppose that I want to find the nearest neighbor to

Vector query = {0,0,0};

and for some cryptic reason I believe that vec2 is quite close to query. So I call:

Vector nn = structure.findNNusingHint(query, vec2);  // vec2 is a hint
assert(nn == vec1); // vec1 is the correct nearest neighbor

The data structure should make use of the hint I provided. It should improve upon it until it gets the true nearest neighbor.

Bonus points if the structure supports insertions and deletions.

EDIT:

I am looking for structures that can compute the nearest neighbor in sublinear time. At least in some cases. Something like k-d trees or cover trees.

My problem has these characteristics:

  1. Dimensionality too high for k-d trees (5 - 50 dimensions)
  2. Frequent insertions and deletions
  3. I always have a good guess as to which vector is near to the query

I want to use this structure in an evolutionary algorithm used for continuous mathematical optimization. The algorithm contains a large population of candidate solutions. Each solution should be aware of its surroundings in this algorithm.

New candidate solutions are created by slightly modifying existing solutions. That is the existing solution out of which the new solution is created is the hint I want to use.

Upvotes: 3

Views: 1066

Answers (4)

Cybercartel
Cybercartel

Reputation: 12592

There is the nearest neighbor graph. You can find it by looking for the closest edge to the point and all nearest edge, I.e triangles in a delaunay triangulation. Or you can use a spatial index, for example a space filling curve. You can download my php class hilbert curve at phpclasses.org. It uses a hilbert curve to find the hamiltonian path.

Upvotes: 0

Matthieu M.
Matthieu M.

Reputation: 299870

I believe you should try and aim for an R-Tree or M-Tree.

In both cases, the idea is to use a bounding volume approach. For example, if you build a binary tree, you split the space in half with a hyper-plan (represented as a Vector). The hyper-plan might not be aligned on a particular axis, and the sign of the dot-product with your vector indicates whether to continue looking left or right (when placing/deleting a vector).

For the nearest neighbor search you have to:

  • locate the nearest neighbor in the same bounding volume than the node itself, this is the candidate.
  • if this candidate is closer than the boundary of the volume, you are done.
  • if it is not, then you need to search in the neighboring volumes that are closer than the candidate.

I realize this is very high-level... specifically, I do not know efficient partitioning or update strategies. The R-Tree article references several approaches.

Upvotes: 1

user2249683
user2249683

Reputation:

You could do like this:

#include <vector>
#include <iostream>

struct Point { int x; int y; int z; };

inline std::ostream& operator << (std::ostream& stream, const Point& p) {
    return stream << p.x << ", " << p.y << ", " << p.z;
}

struct NearestNeighborStructure
{
    NearestNeighborStructure(std::size_t num_points) {
        m_points.reserve(num_points);
    }

    void insert(const Point& p) {
        m_points.push_back(p);
    }

    int manhattan_distance(const Point& a, const Point& b) const {
        using std::abs;
        return abs(b.x - a.x) + abs(b.y - a.y)  + abs(b.z - a.z);
    }

    const Point& find(const Point& query /* ignored hint*/) const {
        std::size_t result = 0;
        int distance = std::numeric_limits<int>::max();
        for(std::size_t i = 0; i < m_points.size(); ++i) {
            int d = manhattan_distance(query, m_points[i]);
            if(d < distance) {
                result = i;
                distance = d;
            }
        }
        return m_points[result];
    }

    private:
    std::vector<Point> m_points;
};

int main()
{
    Point p[3] = { {1,0,0}, {1,1,0},  {1,1,1} };
    NearestNeighborStructure structure(3);
    structure.insert(p[0]);
    structure.insert(p[1]);
    structure.insert(p[2]);

    Point query = {0,0,0};
    std::cout << "Nearest: " << structure.find(query) << std::endl;
    return 0;
}

You might use a different container (eg a deque) if you do not know the amout of points you will insert.

Upvotes: 0

Related Questions