Reputation: 3466
Is there any rule, when I like to find cosine similarity between two documents that have different number of words?
Upvotes: 0
Views: 315
Reputation: 6882
The standard formula does not require the number of words to match. You can just sum over the union of the words of both documents. All words that are in B but not in A give rise to a 0 in the word vector for A. All words that are in A but not in B give rise to a 0 in the word vector for B.
Upvotes: 2