Reputation: 1894
If I have a vector (for example: (5,4,6,8)
) in my application and I want to find similarity to other vector in my DB, let say for simplicity that I'm calculating distance between two vectors with Manhattan distance.
What I need is a way to calculate the algorithm (Manhattan distance in my example) between my vector and all the vectors that are stored in my DB, Can I do 10 million vectors under a couple of seconds ?
Upvotes: 0
Views: 79
Reputation: 640
If You really deal with a lot of data, what You really need is an Approximate Near Neighborhood - http://en.wikipedia.org/wiki/Nearest_neighbor_search#Approximate_nearest_neighbor implementation. Take look at Annoy - https://pypi.python.org/pypi/annoy/1.8.0 project page. There is a benchmark with other ANN projects wich You can find interesting. Maybe there is a implementation as a plugin for DB, but I am not aware of such. However, ANN can be also used to pre-compute top-n NN and store them in DB as a list for User/Item.
Upvotes: 1