muudscope
muudscope

Reputation: 7070

Multidimensional search by combining geospatial indexes

I build application that stores millions of floating point vectors, each vector having ~100 dimensions. With a query vector, I need to search through these vectors for the k nearest (euclidean) matches. Run time must be faster than scanning all millions of vectors. By "vector" I mean in the linear algebra term a list of ~100 floating point numbers i.e. [0.3, -15.7, 0.004, 457.1, ...]

I know databases like MySQL and MongoDB provide spatial indexes that work for 2 dimensions. Is there a way to adapt this to many more dimensions maybe with composite indexes? Or are there other other data stores support indexes on more dimension?

Upvotes: 7

Views: 1176

Answers (3)

Krishna
Krishna

Reputation: 411

postgresql. It supports any number of dimensions you want. Default size is 100. This can be changed in cubedata.h file during installation. cube operator within postgres has r tree implemented for fast queries.

Upvotes: 1

Ouroboros
Ouroboros

Reputation: 1534

I can relate to your pain. There is no R-Tree type of implementation in MongoDB, I m not sure if there's one in SQL DBs. I found the following link useful:

http://www.slideshare.net/nknize/mongo-sv-knizefinal

Upvotes: 0

mcdowella
mcdowella

Reputation: 19621

If you are looking for exact matches, 100 dimensions is a lot. If you are prepared to settle for approximate matches, there is a class of Locality-Sensitive-Hashing schemes. You could generate a hash or series of hash values for you data sets and use an ordinary database or a 2-d spatial database to look for matches based on the hash value. One reference is http://people.csail.mit.edu/indyk/p117-andoni.pdf.

Upvotes: 3

Related Questions