Reputation: 417
So, i trying to build a basic recommendation system, i first get what the people who liked this movie also liked (collaborative filtring)(user based), then i get a chunk of various data (movies), because lets say people who liked toy story may also like SCI-fi movies. but movies of this type is irrelative to toy story very much, so i want to filter the results again by its genres, toy story has 5 genres (Animation, Action, Adventure, etc) i want to only get movies which have share these genres in common.
this my cypher query
match (x)<-[:HAS_GENRE]-(ee:Movie{id:1})<-[:RATED{rating: 5}]
-(usr)-[:RATED{rating: 5}]->(another_movie)<-[:LINK]-(l2:Link),
(another_movie)-[:HAS_GENRE]->(y:Genre)
WHERE ALL (m IN x.name WHERE m IN y.name)
return distinct y.name, another_movie, l2.tmdbId limit 200
the first record i get back is star wars 1977, which has only Adventure genre matches toy story genres.. help me writing better cypher
Upvotes: 0
Views: 92
Reputation: 30397
There are a few things we can do to improve the query.
Collecting the genres should allow for the correct WHERE ALL clause later. We can also hold off on matching to the recommended movie's Link node until we filter down to the movies we want to return.
Give this one a try:
MATCH (x)<-[:HAS_GENRE]-(ee:Movie{id:1})
// collect genres so only one result row so far
WITH ee, COLLECT(x) as genres
MATCH (ee)<-[:RATED{rating: 5}]-()-[:RATED{rating: 5}]->(another_movie)
WITH genres, DISTINCT another_movie
// don't match on genre until previous query filters results on rating
MATCH (another_movie)-[:HAS_GENRE]->(y:Genre)
WITH genres, another_movie, COLLECT(y) as gs
WHERE size(genres) <= size(gs) AND ALL (genre IN genres WHERE genre IN gs)
WITH another_movie limit 200
// only after we limit results should we match to the link
MATCH (another_movie)<-[:LINK]-(l2:Link)
RETURN another_movie, l2.tmdbId
As movies are likely going to have many many ratings, the match to find movies both rated 5 is going to be the most expensive part of the query. If many of your queries rely on a rating of 5, you may want to consider creating a separate [:MAX_RATED] relationship whenever a user rates a movie a 5, and use those [:MAX_RATED] relationships for queries like these. That ensures that you don't initially match to a ton of rated movies that all have to be filtered by their rating value.
Alternately, if you want to consider recommendations based on average ratings for movies, you may want to consider caching a computed average of all ratings for every movie (maybe the computation gets rerun for all movies a couple times a day). If you add an index on the average rating property on movie nodes, it should provide faster matching to movies that are rated similarly.
Upvotes: 1