Reputation: 13
We were trying to build an online recommender (collaborative filtering user-user) using cosine similarity with data in Neo4j.
**A difference was the input data set is a boolean preference (as opposed to a rating) ** for 1 mil users X ~700 products. eg. User_ID, Product_ID, Preference 11,48989399,1
Created nodes for users and products with index on id (user_id, product_id)
I tried writing a cypher query to get the top 20 closest neighbours based on the formula
Similarity = (Products liked by both users) / sqrt(# of products liked by user1) * sqrt(# of products liked by user2)
Below is the query:
MATCH (a:Users)-[d]->() using index a:Users(id) where =1
WITH as user1, count(d) as user1_prod
MATCH (a:Users)-[]->()<-[dd]-others using index a:Users(id) where =1
WITH user1, user1_prod, others, count(dd) as intersect
MATCH others-[b1]->() with user1, as user2, intersect, user1_prod, count(b1) as user2_prod
WITH user1, user2, intersect/(sqrt(user1_prod) * sqrt(user2_prod)) as similarity
RETURN user2, similarity order by similarity desc limit 20;
The query returns results in close to 22 seconds post which the recommendation of products is scalable and fast.
Is there a better way to write the cypher for similarity since the graph might be more dense in further scenarios.
Details: Kernel version Neo4j - Graph Database Kernel (neo4j-kernel), version: 2.1.6
772 772 nodes
neostore.relationshipstore.db.mapped_memory 3078M
CentOS release 6.6 (Final)
Upvotes: 0
Views: 201
Reputation: 41686
It will be much faster if you rewrite it as an Neo4j server extension, then you can utilize node.getDegree()
which is constant time retrieval of a node's degree.
The core code would look like this, you can simplify it by extracting a function for getting the products per user.
Node user1 = db.findByLabelAndProperty(User,"id",1);
long likes1 = user1.getDegree(LIKES,OUTGOING);
Set<Node> products1 = new HashSet<>(likes1);
for (Relationship rel = user1.getRelationships(LIKES,OUTGOING)) {
Node user2 = db.findByLabelAndProperty(User,"id",2);
long likes2 = user2.getDegree(LIKES,OUTGOING);
Set<Node> products2 = new HashSet<>(likes2);
for (Relationship rel = user2.getRelationships(LIKES,OUTGOING)) {
return products1.size() / (sqrt(likes1) * sqrt(likes2));
Upvotes: 0