Reputation: 603
Simhash is an text similarity algorithm proposed by Moses Charikar in his paper "Similarity Estimation Techniques from Rounding Algorithms". However, in his original paper, he proposed to use random vectors as part of the hash function. Now common approaches add and decrement the weight of the words. This means we are using signed random vectors (vectors that only composes 1 and -1) instead of random vectors.
This obviously decreases the accuracy of the result. However, I wonder by how much? Is there a mathematical way to calculate so?
Upvotes: 0
Views: 27