Dat Huynh
Dat Huynh

Reputation: 88

How can similarity values in recommendations systems such as Mahout be trusted?

I have been playing around with Mahout recommendation system lately, and succeeded to make a simple recommendation system out of it. But it doesn't make sense to me, how are these similarity values calculated by math can be useful for a recommendation system? Especially in ItemBasedSimilarity? I can understand that 2 users can be similar to each other by the items that they like/view/purchase/rate, but how are 2 items similar to each other?

Upvotes: 0

Views: 101

Answers (2)

Dat Huynh
Dat Huynh

Reputation: 88

After doing some research, I found my answer here (link). The article only shows 2 examples of 2 metrics (Euclidean distance and cosine similarity) but it helped visualizing how the similarity values are computed, thus can be trusted.

Upvotes: 0

Dragan Milcevski
Dragan Milcevski

Reputation: 776

The item-based similarity (item-item similarity) is similar to the user based similarity (user-user similarity). As you said, two users are similar to each other by the items the like/view/purchase/rate. Similarly, two items are similar to each other based on some characteristics they share. For example, The Lord of The Rings and The Hobit are similar because they are fantasy novels, both written by J.R.R. Tolkien, the characters of the books overlap, and so on. This often requires more information about the items.

Now, the item-based recommendation looks for items that the user liked/viewed/purchased/rated in the past to recommend similar items. It doesn't look to the other users at all.

The pseudo code of the algorithm goes like this:

for every item i that u has no preference for yet
  for every item j that u has a preference for
    compute a similarity s between i and j
    add u's preference for j, weighted by s, to a running average
 return the top items, ranked by weighted average

The running time of an item-based recommender scales up as the number of items increases, whereas a user-based recommender’s running time goes up as the number of users increases.

Because the item-item similarities are more fixed, they’re better candidates for precomputation. Precomputing similarities takes work, but it speeds up recommendations at runtime.

The item-based approach was invented at Amazon to address the scale challenges with user-based filtering.

Upvotes: 2

Related Questions