user2468702
user2468702

Reputation: 141

Nearest neighbor searches in non-metric spaces

I would like to know about nearest neighbor search algorithms when working in non-metric spaces? In particular, is there any variant of a kd-tree algorithm in this setting with provable time complexity etc?

Upvotes: 1

Views: 656

Answers (2)

TilmannZ
TilmannZ

Reputation: 1889

Probably more of theoretic interest for you: The PH-Tree is similar to a quadtree, however, it transforms floating points coordinates into a non-metric system before storing them. The PH-Tree performs all queries (including kNN queries) on the non-metric data using a non-metric distance function (you can define your own distance functions on top of that).

In terms of kNN, the PH-Tree performs on par with trees like R+Trees and usually outperforms kd-trees. The non-metric data storage appears to have little negative, possibly even positive, effect on performance, except maybe for the (almost negligible) execution time for the transformation and distance function.

The reason that the data is transformed comes from an inherent constraint of the tree: The tree is a bit-wise trie, which means it can only store bitsequences (can be seen as integer numbers). In order to store floating point numbers in the tree, we simply use the IEEE bit representation of the floating point number and interpret it as an integer (this works fine for positive number, negative numbers are a bit more complex). Crucially, this preserves the ordering, ie. if a floating point f1 is larger than f2, then the integer representation of the bits of int(f1) is also always larger than int(f2). Trivially, this transformation allows storing floating point numbers as integers without any loss of precision(!).

The transformation is non-metric, because the leading bits (after the sign bit) of a floating point number are the exponent bits, followed by the fraction bits. Clearly, if two number differ in their exponent bits, their distance grows exponentially faster (or slower for negative exponents) compared to distances cause by differences in the fraction bits.

Why did we use a bit-wise try? If we have d dimensions, it allows an easy transformation such that we can map the n'th bit of each of the d values of a coordinate into bit string with d bits. For example, for d=60, we get a 60 bit string. Assuming a CPU register width of 64 bits, this means we can perform many operations related to queries in constant time, i.e. many operations cost just one CPU operation, independent of whether we have 3 dimensions or 60 dimensions. It's probably hard to understand what's going on from this short text, more details on this can be found here.

Upvotes: 1

gsamaras
gsamaras

Reputation: 73366

NMSLIB provides a library for performing Nearest Neighor Search in non-metric spaces. That Github page provides a dozen of papers to read, but not all apply to non-metric spaces.

Unfortunately, there are few theoretical results regarding the complexity of Neaest Neighbor Search for non-metric spaces and there are no comprehensive empirical evaluations.

I can onsly see some theoretical results in Effective Proximity Retrieval by Ordering Permutations, but I am not convinced. However, I suggest you take a look.

There seem to be not many people, if any, that uses k-d trees for non-metric spaces. They seem to use VP trees, etc. densitrees are also used, as described in Near Neighbor Search in Nonmetric Spaces.

Intuitively, densitrees are a class of decorated trees that hold the points of the dataset in a way similar to the metric tree. The critical difference lies in the nature of tree decoration; instead of having one or several real values reflecting some bounds on the triangular inequality attached to every tree node, each densitree node is associated to a particular classifier called here a density estimator.

Upvotes: 0

Related Questions