Ankit
Ankit

Reputation: 115

How do I optimize a multi-node Neo4J query?

I am new to Cypher querying. Below query has been running for the past few hours. Trying to infer relationships between 'T' nodes by using 'D' and 'R' nodes in the middle. Wanted to understand if there's a better way to write it.

MATCH (t:T)-[r1:T_OF]->(d:D)<-[r2:R_OF]-(m:R)-[r3:R_OF]->(e:D)<-[r4:T_OF]-(u:T) 
WHERE t.name <> u.name AND d.name <> e.name 
RETURN t.name, u.name, count(*) as degree 
ORDER BY degree desc

Here's the count of each node and relationship type -

Nodes
T: 4,657
D: 2,458,733
R: 4,822

Relationships
T_OF: 4,915,004
R_OF: 284,548

Upvotes: 0

Views: 46

Answers (1)

Frank Pavageau
Frank Pavageau

Reputation: 11705

You could add a clause to avoid computing both (t, u) and (u, t), that would reduce the size of the cartesian product by half:

MATCH (t:T)-[:T_OF]->(d:D)<-[:R_OF]-(:R)-[:R_OF]->(e:D)<-[:T_OF]-(u:T) 
WHERE id(t) < id(u)
  AND t.name <> u.name
  AND d.name <> e.name 
RETURN t.name, u.name, count(*) AS degree 
ORDER BY degree DESC

or maybe

MATCH (t:T)-[:T_OF]->(d:D)<-[:R_OF]-(:R)-[:R_OF]->(e:D)<-[:T_OF]-(u:T) 
WHERE t.name < u.name
  AND d.name <> e.name 
RETURN t.name, u.name, count(*) AS degree 
ORDER BY degree DESC

which won't cost an extra read for the id.

It probably doesn't make a difference, but you can also avoid binding variables that you don't use (r1, r2, r3, r4, m).

It's hard to optimize a query when you don't have the matching data and can't PROFILE it. However, I see that you have much more T_OF relationships than you have R_OF, so maybe if you change the traversal order that will prune branches faster:

MATCH (m:R)-[:R_OF]->(d:D)<-[:T_OF]-(t:T)
MATCH (m)-[:R_OF]->(e:D)<-[:T_OF]-(u:T) 
WHERE id(t) < id(u)
  AND t.name <> u.name
  AND d.name <> e.name 
RETURN t.name, u.name, count(*) AS degree 
ORDER BY degree DESC

or even

MATCH (m:R)-[:R_OF]->(d:D)
MATCH (m)-[:R_OF]->(e:D)
WHERE d.name <> e.name 
MATCH (d:D)<-[:T_OF]-(t:T), (e:D)<-[:T_OF]-(u:T) 
WHERE id(t) < id(u)
  AND t.name <> u.name
RETURN t.name, u.name, count(*) AS degree 
ORDER BY degree DESC

You could also try to reduce the size of the first cartesian product with the same id() trick (or ordering the names), but then you need to reassemble the couples at the end:

MATCH (m:R)-[:R_OF]->(d:D)
MATCH (m)-[:R_OF]->(e:D)
WHERE id(d) < id(e)
  AND d.name <> e.name 
MATCH (d:D)<-[:T_OF]-(t:T), (e:D)<-[:T_OF]-(u:T) 
WHERE t.name <> u.name
WITH t.name AS name1, u.name AS name2, count(*) AS degree
WITH CASE WHEN name1 < name2 THEN name1 ELSE name2 END AS name1,
     CASE WHEN name1 < name2 THEN name2 ELSE name1 END AS name2,
     degree
RETURN name1, name2, sum(degree) AS degree
ORDER BY degree DESC

All these possibilities would need to be profiled (on a smaller set, or use EXPLAIN to just get the plan, but that's just the theory and the profile is much more interesting) to see if they lead anywhere.

Upvotes: 1

Related Questions