Reputation: 3
I have read many tutorials and tried a number of minhash LSH, but it cannot generate the similarity matrix, instead it returns just similar data which exceeds the threshold. How can I generate it? My intention is to use the LSH results for clustering.
Upvotes: 0
Views: 872
Reputation: 77495
The whole point of LSH is to avoid pairwise distances, because that does not scale.
If you then put the data into a distance matrix, you get all the scalability problems again!
Instead consider an algorithm like DBSCAN clustering. It doesn't need a distance matrix, only neighbors at distance epsilon.
Upvotes: 1