YUVAL MIZRAHI
YUVAL MIZRAHI

Reputation: 13

Writing a query in Neo4j for finding specific relations between nodes

Lets say I have a database in Neo4j that contains persons and movies, when between two people there is a relationship of 'friend' and between a person and a movie there is a relationship of 'like' or/and 'watch'.

I am still unfamilliar with Neo4j and writing queries with it.. How can I write a query that gets all the movies Aviv watched (Watch) and liked (Like) and also that two of Aviv's friends watched or liked. (Those two friends are up till level 3 - meaning Aviv's friends, Aviv's friends' friends, Aviv's friends' friends' friends.)

Up until now I succeed in finding all the movies that Aviv 'like' & 'watch' and all the 1-3 level of Aviv's friends:

MATCH ({name:'Aviv'})-[:friend*1..3]->(f:Person) 
WHERE not f.name = 'Aviv'
WITH collect (f) AS friends

MATCH (m:Movie) 
WHERE (m)<-[:watched]-({name: "Aviv"}) AND (m)<-[:liked]-({name: "Aviv"}) 
WITH collect (m) AS mov,friends

There's a picture of the database attached below.

enter image description here

1:

Upvotes: 1

Views: 228

Answers (1)

InverseFalcon
InverseFalcon

Reputation: 30397

Let's fix up the first part of your query first, then look at the rest.

You're doing some redundant matching here, it's best to set a variable for Aviv's node so you can reuse it in the rest of your query.

You should use the :Person label for Aviv's node in your match, and make sure you have an index on :Person(name) so your query can use an index lookup to find Aviv's node fast, as this is the starting node in the graph.

Also, the second part where you match on movies Aviv liked and watched is considering all :Movie nodes and filtering, rather than getting the initial set of movies that Aviv liked or watched first. Use the pattern in your MATCH rather than the WHERE clause here.

If the :friend relationship is always symmetrical as in your example (where the relationship always comes in pairs for both directions), it's better to use just a single relationship, and treat it as undirected in your query (as a single :friend relationship is enough to determine the two are friends, no need for a redundant relationship).

Lastly it's probably best to switch up the ordering of your operations. If you have a large graph, doing a match from movies to the number of people in the graph who have watched or liked them (and only then filtering it to the friends you previously matched to) sounds more expensive than doing a match for the movies the friends have liked or watched (and only then filtering it to the movies previously matched to).

MATCH (a)-[:watched]->(m:Movie), (a)-[:liked]->(m) 
WITH a, collect(m) as movies

MATCH (a:Person{name:'Aviv'})-[:friend*1..3]-(f:Person) 
WHERE a <> f // faster way to ensure Aviv isn't included
WITH distinct f, movies  // deduplicate

MATCH (f)-[:watched|liked]->(m)
WHERE m in movies
WITH m, count(distinct f) as friendWatchedOrLikedCount
WHERE friendWatchedOrLikedCount = 2
RETURN m

This line: WITH m, count(distinct f) as friendWatchedOrLikedCount makes sure that we get the count of distinct people per movie who watched or liked it. That is, if only one friend both watched and liked the movie, it won't get returned, since your criteria is you need exactly 2 friends who liked or watched it.

And finally, according to your sample graph, no results will return, as there are only two movies Aviv has both watched and liked (manInBlack, spiderMan, if my guesses are correct), but on one, only one friend has liked the movie, and for the other movie, only one friend has watched the movie.

Upvotes: 1

Related Questions