Jeremy Fisher
Jeremy Fisher

Reputation: 2782

Cosine similarity for user recommendations

Is cosine similarity a good approach for deciding if 2 users are similar based on responses to questions?

I'm trying to have users answer 10 questions and resolving those responses to a 10-dimensional vector of integers. I then plan to use cosine similarity to find similar users.

I considered resolving each question to an integer and summing the integers to resolve each user to a single integer, but the problem with this approach is that the similarity measure isn't question specific: in other words, if a user gives an answer to question 1 that resolves to 5 and an answer to question 2 that resolves to 0, and another user responds to question 1 with 0 and question 2 with 5, both users "sum to 5", but answered each question fundamentally differently.

So will cosine similarity give a good similarity measure based on each attribute?

Upvotes: 1

Views: 502

Answers (1)

Dr VComas
Dr VComas

Reputation: 735

Summing all integers to resolve to a single integer per user does not seem to be right.

I think cosine similarity actually helps here as a similarity measure, you can try others as well like Jaccard, Euclidean, Mahalanobis etc.

What might help is the intuition behind cosine similarity. The idea is that once you create the 10 dimensional vectors you are working in a 10 dimensional space. Each row is a vector in that space, so the numbers in each components are important, the cosine between two vectors give an idea of how good/bad aligned those vectors are, if they are parallel and the angle is 0 means they go to the same direction, means the components are all proportional, similarity is maximum in this case, (example two users answered with exact the same numbers in all questions). If the components start to differ like in your example users gives 5 to a question and other gives 0 then the vectors fill have different directions, the larger the difference between the answers the more separated the vectors will be, the larger the angle between them, which results in lower cosine and hence similarity.

There are other similarity measures as I mentioned above, one thing ppl usually try is several of these measures vs a test set and sees which one performs better.

Upvotes: 2

Related Questions