Efficiently Find Unique Intermediate Nodes Connecting Two Sets of Nodes in Neo4j/Cypher

Question

I have data on movies that can be either comedies or dramas. I have actors in those movies, who can have multiple roles per movie. I want to find all distinct sets of movies and actors where:

(drama1:Movie {Genre:'Drama'})-[role1]-(actor1:Actor)-[role2]-(comedy:Movie {Genre:'Comedy'})-[role3]-(actor2:Actor)-[role4]-(drama2:Movie {Genre:'Drama'})

That is, I want to find where two (different) dramas are connected by a comedy with which both dramas share at least one actor. I'm struggling to do this efficiently and to get neo4j to give me distinct groups of drama1,drama2,actor1,actor2,comedy. My data is on the order of a few million nodes and tens of millions of relationships, so efficiency is important. A toy setup, which can be plugged into the neo4j online console is:

create (a:Movie {Genre:'Comedy'}), (b:Movie {Genre:'Comedy'}), (c:Movie {Genre:'Comedy'}), (d:Movie {Genre:'Comedy'}), (f:Movie {Genre:'Drama'}), (h:Movie {Genre:'Drama'}),(i:Actor {Name:'Sarah'}),(j:Actor {Name: 'Maria'}),(k:Actor {Name:'Mike'}),(l:Actor {Name:'Jane'}),(m:Actor {Name:'Sam'}),(q:Actor {Name:'Matt'}),(r:Actor {Name:'Tom'}), (i)-[:ActedIn]->(a), (i)-[:ActedIn]->(a) , (i)-[:ActedIn]->(a), (i)-[:ActedIn]->(a) , (i)-[:ActedIn]->(f) , (j)-[:ActedIn]->(b) , (j)-[:ActedIn]->(h) , (j)-[:ActedIn]->(h) , (q)-[:ActedIn]->(c) , (q)-[:ActedIn]->(b) , (q)-[:ActedIn]->(a) , (r)-[:ActedIn]->(f) , (r)-[:ActedIn]->(f) , (r)-[:ActedIn]->(a) , (j)-[:ActedIn]->(b) , (j)-[:ActedIn]->(c) , (k)-[:ActedIn]->(d), (l)-[:ActedIn]->(c) , (i)-[:ActedIn]->(a) , (i)-[:ActedIn]->(h) , (m)-[:ActedIn]->(h)

I've mostly tried variations of

match (drama1:Movie {Genre:'Drama'})-[role1]-(actor1:Actor)-[role2]-(comedy:Movie {Genre:'Comedy'})-[role3]-(actor2:Actor)-[role4]-(drama2:Movie {Genre:'Drama'}) return drama1,actor1,comedy,actor2,drama2

Efficiently Find Unique Intermediate Nodes Connecting Two Sets of Nodes in Neo4j/Cypher

Answers (1)

Related Questions