vomitcuddle
vomitcuddle

Reputation: 315

Recommendation algorithm (and implementation) for finding similar items and users

I have a database of about 700k users along with items they have watched/listened to/read/bought/etc. I would like to build a recommendation engine that recommends new items based on what users with similar taste in things have enjoyed, as well as actually finding people the user might want to be friends with on a social network I'm building (similar to last.fm).

My requirements are as follows:

Please don't give an answer like "use pysuggest or mahout", since those implement a plethora of algorithms and I'm looking for one that's most suitable for my data/use. I've been interested in Neo4j and how it all could be expressed as a graph of connections between users and items.

Upvotes: 7

Views: 5906

Answers (3)

Alessandro Negro
Alessandro Negro

Reputation: 517

I can suggest to have a look at my open source project Reco4j. It is a graph-based recommendation engine that can be used on a graph database like yours in a very straigthforward way. We support as graph database neo4j. It is in an early version but very soon a more complete version will be available. In the meantime we are looking for some use case of our project, so please contact me so that we can see how we can collaborate.

Upvotes: 1

Michael Hunger
Michael Hunger

Reputation: 41676

Actually that is one of the sweetspots of a graph database like Neo4j.

So if your data model looks like this:

user -[:LIKE|:BOUGHT]-> item

You can easily get recommendations for an user with a cypher statement like this:

start user = node:users(id="doctorkohaku")
match user -[r:LIKE]->item<-[r2:LIKE]-other-[r3:LIKE]->rec_item
where r.stars > 2 and r2.stars > 2 and r3.stars > 2
return rec_item.name, count(*) as cnt, avg(r3.stars) as rating
order by rating desc, cnt desc limit 10

This can also be done using the Neo4j Core-API or the Traversal-API.

Neo4j has an Python API that is also able to run cypher queries.

Disclaimer: I work for Neo4j

There are also some interesting articles by Marko Rodriguez about collaborative filtering.

Upvotes: 4

Steve
Steve

Reputation: 21499

To determine similarity between users you can run cosine or pearson similarity (Found in Mahout and everywhere on the net really!) across the user vector. So your data representation should look something like

 u1  [1,2,3,4,5,6] 
 u2  [35,24,3,4,5,6] 
 u1  [35,3,9,2,1,11] 

In the point where you want to take multiple items into consideration you can use the above to determine how similar someones profiles are. The higher the correlation score the likelihood they have very similar items is. You can set a threshold so someone with .75 similarity has a similar set of items in their profile.

Where you are missing values you can of course make up your own values. I'd just keep them binary and try to blend the various different algorithms. That's called an ensemble.

Overall you are looking for something called item based collaborative filtering as the recommendation aspect of your set up and also used to identify similar items. It's a standard recommendation algorithm that does pretty much everything you've asked for.

When trying to find similar users you can perform some type of similarity metric across your user vectors.

Regarding Python, the book called programming in collective intelligence gives all their samples in python so go pick up a copy and read chapter 1.

Representing all of this as a graph will be somewhat problamatic as your undying representation is a Bipartile Graph. There are lots of recommendation approaches out there that use a graph based approach but its generally not the best performing approach.

Upvotes: 5

Related Questions