cechode
cechode

Reputation: 1022

having difficulties constructing an efficient query

Given a model similar to the movie database with 3000 actors and 5000 movies how how would you go about finding the most common groups of actors who work together?

i've tried things like

match (a1:Actor)-[:ACTED_IN]-(m:Movie)-[:ACTED_IN]-(a2:Actor)
where a1<>a2
return distinct a1, count(m) as movieCount, a2  limit 999

but this takes the server to 99% cpu and never comes back. moreover it will only give me pairs, rather than groups if.

i've tried to length(collect (m)) or collect a2 but neither seem to return something correlating to my target.

anywho if you feel like taking a stab at it, and providing a pointer or two it would be greatly appreciated.

Upvotes: 2

Views: 68

Answers (1)

Jim Biard
Jim Biard

Reputation: 2272

To get the first query to work, try

MATCH (a1:Actor)-[:ACTED_IN]->(m:Movie)<-[:ACTED_IN]-(a2:Actor)
WITH a1, a2, COUNT(m) AS c
WHERE c > 1
RETURN DISTINCT a1, a2, c ORDER BY c DESC LIMIT 999

Your WHERE clause is not required.

The best way I can think of to solve the larger problem is to create Group nodes and then study the results. Here's a query that will do it. It may take a while.

MATCH (m1:Movie)<-[:ACTED_IN]-(a:Actor)-[:ACTED_IN]->(m2:Movie)
WITH m1, m2, COLLECT(a) AS actors, COLLECT(id(a)) as actorIDs
WHERE LENGTH(actors) > 1
MERGE (g:Group {actors : actorIDs})
    ON CREATE SET g.count = 1
    ON MATCH SET g.count = g.count + 1
WITH g, actors
UNWIND actors AS an
MERGE (g)<-[:PART_OF]-(an)

I tested this on a graph of 300 actors and 500 movies with 10 randomly selected actors in each movie. It took 48 seconds to construct the groups. Once you have the groups, you can filter and/or sort on the number of actors, the count of times the group is seen, etc.

I found that the collection of actor node IDs was always sorted. If this isn't true the query may not work right, because the merge on an existing :Group node may not succeed. In that case, you must use ORDER BY on the :Actor node IDs before collecting them.

Upvotes: 1

Related Questions