user3636583
user3636583

Reputation: 177

Spark implementation for Locality Sensitive Hashing

As part of a project I'm doing for my studies I'm looking for a way to use the hashing function of LSH with Spark. Is there any way to do so?

Upvotes: 2

Views: 3812

Answers (2)

xenocyon
xenocyon

Reputation: 2498

The recently released version of Spark (2.1.0) provides built-in support for LSH, but apparently only in the Scala API (not in PySpark yet).

Upvotes: 1

Nilesh
Nilesh

Reputation: 1222

Try this implementation:

https://github.com/mrsqueeze/spark-hash

Quoting from the README, "this implementation was largely based on the algorithm described in chapter 3 of Mining of Massive Datasets" which has a great description of LSH and minhashing.

Upvotes: 3

Related Questions