Reputation: 115
I am new to Cypher querying. Below query has been running for the past few hours. Trying to infer relationships between 'T' nodes by using 'D' and 'R' nodes in the middle. Wanted to understand if there's a better way to write it.
MATCH (t:T)-[r1:T_OF]->(d:D)<-[r2:R_OF]-(m:R)-[r3:R_OF]->(e:D)<-[r4:T_OF]-(u:T)
WHERE t.name <> u.name AND d.name <> e.name
RETURN t.name, u.name, count(*) as degree
ORDER BY degree desc
Here's the count of each node and relationship type -
Nodes
T: 4,657
D: 2,458,733
R: 4,822
Relationships
T_OF: 4,915,004
R_OF: 284,548
Upvotes: 0
Views: 46
Reputation: 11705
You could add a clause to avoid computing both (t, u) and (u, t), that would reduce the size of the cartesian product by half:
MATCH (t:T)-[:T_OF]->(d:D)<-[:R_OF]-(:R)-[:R_OF]->(e:D)<-[:T_OF]-(u:T)
WHERE id(t) < id(u)
AND t.name <> u.name
AND d.name <> e.name
RETURN t.name, u.name, count(*) AS degree
ORDER BY degree DESC
or maybe
MATCH (t:T)-[:T_OF]->(d:D)<-[:R_OF]-(:R)-[:R_OF]->(e:D)<-[:T_OF]-(u:T)
WHERE t.name < u.name
AND d.name <> e.name
RETURN t.name, u.name, count(*) AS degree
ORDER BY degree DESC
which won't cost an extra read for the id.
It probably doesn't make a difference, but you can also avoid binding variables that you don't use (r1
, r2
, r3
, r4
, m
).
It's hard to optimize a query when you don't have the matching data and can't PROFILE
it. However, I see that you have much more T_OF
relationships than you have R_OF
, so maybe if you change the traversal order that will prune branches faster:
MATCH (m:R)-[:R_OF]->(d:D)<-[:T_OF]-(t:T)
MATCH (m)-[:R_OF]->(e:D)<-[:T_OF]-(u:T)
WHERE id(t) < id(u)
AND t.name <> u.name
AND d.name <> e.name
RETURN t.name, u.name, count(*) AS degree
ORDER BY degree DESC
or even
MATCH (m:R)-[:R_OF]->(d:D)
MATCH (m)-[:R_OF]->(e:D)
WHERE d.name <> e.name
MATCH (d:D)<-[:T_OF]-(t:T), (e:D)<-[:T_OF]-(u:T)
WHERE id(t) < id(u)
AND t.name <> u.name
RETURN t.name, u.name, count(*) AS degree
ORDER BY degree DESC
You could also try to reduce the size of the first cartesian product with the same id()
trick (or ordering the names), but then you need to reassemble the couples at the end:
MATCH (m:R)-[:R_OF]->(d:D)
MATCH (m)-[:R_OF]->(e:D)
WHERE id(d) < id(e)
AND d.name <> e.name
MATCH (d:D)<-[:T_OF]-(t:T), (e:D)<-[:T_OF]-(u:T)
WHERE t.name <> u.name
WITH t.name AS name1, u.name AS name2, count(*) AS degree
WITH CASE WHEN name1 < name2 THEN name1 ELSE name2 END AS name1,
CASE WHEN name1 < name2 THEN name2 ELSE name1 END AS name2,
degree
RETURN name1, name2, sum(degree) AS degree
ORDER BY degree DESC
All these possibilities would need to be profiled (on a smaller set, or use EXPLAIN
to just get the plan, but that's just the theory and the profile is much more interesting) to see if they lead anywhere.
Upvotes: 1