Reputation: 604
I have some trouble understanding the purpose of hashing. I don't know much about algorithms but I was interested in understanding locally sensitive hashing for the nearest neighbor search. Given a radius r
and a constant c>1
, we must preprocess some data points {p1,..., pn}
such that when we are given a point q, if there is a point p
within a distance r
of q
, we will return a point from {p1,...,pn}
that is within a distance cr
of q
. Why is hashing helpful in this case?
Upvotes: 1
Views: 326
Reputation: 12141
Local sensitivity hashing is used to hash similar input items into the same "buckets" with high probability. I'm not sure about the definition you have of LSH in your original question, I have never seen it defined in such a way. In particular, the statement "with high probability" is very important to seeing why LSH is useful.
At a high level, LSH is the opposite of typical hashing functions. With typical hashing functions, we want to loosely guarantee minimal bucket collisions, even/especially for similar input items. This property is useful to construct hash tables that yield efficient operations.
In LSH, the goal is to produce a hashing function such that two points p1
and p2
in a d
dimensional space fall into the same bucket with high probability when p1
and p2
are "close" where close is defined by a given metric or distance (typically L1 or L2.) Similarly, two points p1
and p2
that are not close should fall into different buckets with high probability.
To give a typical example of a family of functions that are used to construct a LSH, the function
f(x) = 1 if dot(x,r) > 0 else 0
with r
drawn from a d
dimensional normal distribution. Drawing m
such functions will yield a m
bit hash. Using n
such m
bit hashes will yield a basic LSH that can be used to approximately find nearest neighbours under a given metric (typically L1 or L2.)
Upvotes: 1