tigerjack
tigerjack

Reputation: 1178

Which should I use to implement a collaborative filtering on top of Neo4j?

I'm working on a project (a social network) which use Neo4j (v1.9) as the underlying datastore and Spring Data Neo4j. I'm trying to add a tag system to the project and I'm searching for ways to efficiently implement tag recommendation using collaborative filtering strategies. After a lot of researches, I've come with these options:

  1. Cypher. It is the embedded query language used by Neo4j. No other framework needed, maybe the computational times are better than the others. Maybe I can easily implement the queries using Spring Data Neo4j.
  2. Apache Mahout. It offers machine learning algorithms focused primarly in the areas of collaborative filtering, clustering and classification. However, it isn't designed for graph databases and could be potentially slow.
  3. Apache Giraph. Open source counterpart of Google Pregel.
  4. Apache Spark. It is a fast and general engine for large-scale data processing.
  5. reco4j. It is the best suited solution until now, but the project seems dead.
  6. Apache Spark GraphX + Mazerunner. Suggested by the answer of @johnymontana. I'm documenting on it. The main issue is that I don't know if it supports collaborative filtering.
  7. Graphaware Reco. Suggested by @ChristopheWillemsen in a comment. From the official site

is an extensible high-performance recommendation engine skeleton for Neo4j, allowing for computing and serving real-time as well as pre-computed recommendations.

However, I haven't understand yet if it works with old version of Neo4j (I can't upgrade the Neo4j version at the moment).

So, what do you suggest and why? Feel free to suggest other interesting frameworks not listed above.

Upvotes: 2

Views: 1218

Answers (1)

William Lyon
William Lyon

Reputation: 8546

Cypher is very fast when it comes to local traversals, but is not optimized for global graph operations. If you want to do something like compute similarity metrics between all pairs of users then using a graph processing framework (like Apache Spark GraphX) would be better. There is a project called Mazerunner that connects Neo4j and Spark that you might want to take a look at.

For a pure Cypher approach, here and here are a couple of recent blog posts demonstrating Cypher queries for recommendations.

Upvotes: 2

Related Questions