roi_saumon
roi_saumon

Reputation: 604

Why use hashing in nearest neighbour search

I have some trouble understanding the purpose of hashing. I don't know much about algorithms but I was interested in understanding locally sensitive hashing for the nearest neighbor search. Given a radius r and a constant c>1, we must preprocess some data points {p1,..., pn} such that when we are given a point q, if there is a point p within a distance r of q, we will return a point from {p1,...,pn} that is within a distance cr of q. Why is hashing helpful in this case?

Upvotes: 1

Views: 326

Answers (1)

ldog
ldog

Reputation: 12141

Local sensitivity hashing is used to hash similar input items into the same "buckets" with high probability. I'm not sure about the definition you have of LSH in your original question, I have never seen it defined in such a way. In particular, the statement "with high probability" is very important to seeing why LSH is useful.

At a high level, LSH is the opposite of typical hashing functions. With typical hashing functions, we want to loosely guarantee minimal bucket collisions, even/especially for similar input items. This property is useful to construct hash tables that yield efficient operations.

In LSH, the goal is to produce a hashing function such that two points p1 and p2 in a d dimensional space fall into the same bucket with high probability when p1 and p2 are "close" where close is defined by a given metric or distance (typically L1 or L2.) Similarly, two points p1 and p2 that are not close should fall into different buckets with high probability.

To give a typical example of a family of functions that are used to construct a LSH, the function

f(x) = 1 if dot(x,r) > 0 else 0

with r drawn from a d dimensional normal distribution. Drawing m such functions will yield a m bit hash. Using n such m bit hashes will yield a basic LSH that can be used to approximately find nearest neighbours under a given metric (typically L1 or L2.)

Upvotes: 1

Related Questions