Anurag Tripathi
Anurag Tripathi

Reputation: 1208

Find similar items based on item attributes

Most of the recommendation algorithm in mahout requires user-item preference. But I want to find similar items for a given item. My system doesn't have user inputs. i.e. for any movie these can be attribute which can be use to find similarity coefficient

Edit:

I have found few example from command line, but I want to do it in java and save the pre-computed values for later use.

Upvotes: 3

Views: 1252

Answers (2)

CrushasaurusRex
CrushasaurusRex

Reputation: 35

It sounds like Mahout's spark-rowsimilarity algorithm, available since version 0.10.0, would be the perfect solution to your problem. It compares the rows of a given matrix (i.e: row vectors representing movies and their properties), looking for cooccurrences of values across those rows - or in your case: cooccurrences of Genres, Directors, and Actors. No user history or item interaction needed. The end result is another matrix mapping each of your movies to the top n most similar other movies in your collection, based on cooccurrence of genre, director, or actor.

The Apache Mahout site has a great write-up regarding how to do this from the command line, but if you want a deeper understanding of what's going on under the covers, read Pat Ferrel's machine learning blog Occam's Machete. He calls this type of similarity content or metadata similarity.

Upvotes: 0

dodo
dodo

Reputation: 319

I think in the case of features vectors, the best similarity measure is the ones with exact matches like jaccard similarity for example.

In jaccard, the similarity between two items vectors is calculated as:

number of features in intersection/ number of features in union.

So, converting the genre to a numerical value will not make a difference since the exact match ( that is used to find intersection) will be the same in non numerical values.

Take a look at this question for how to do it in mahout:

Does Mahout provide a way to determine similarity between content (for content-based recommendations)?

Upvotes: 1

Related Questions