vikifor
vikifor

Reputation: 3466

Different document length in computing cosine similarity?

Is there any rule, when I like to find cosine similarity between two documents that have different number of words?

Upvotes: 0

Views: 315

Answers (1)

Udo Klein
Udo Klein

Reputation: 6882

The standard formula does not require the number of words to match. You can just sum over the union of the words of both documents. All words that are in B but not in A give rise to a 0 in the word vector for A. All words that are in A but not in B give rise to a 0 in the word vector for B.

Upvotes: 2

Related Questions