Spec
Spec

Reputation: 847

Neo4j more specific query slower than more generic one

I'm trying to count all values collected in one subtree of my graph. I thought that the more descriptive path from the root node I provide, the faster the query will run. Unfortunately this isn't true in my case and I can't figure out why.

Original, slow query:

MATCH (s:Sandbox {name: "sandbox"})<--(root)-[:has_metric]->(n:Metric)-[:most_recent|:prev*0..]->(v:Value) return count(v)

Slower query PROFILE returns 38397 total db hits in 2203 ms.

However without matching top-level node, labeled Sandbox, query is 10 times faster:

MATCH (root)-[:has_metric]->(n:Metric)-[:most_recent|:prev*0..]->(v:Value) return count(v)

Faster query PROFILE returns 38478 total db hits in 159 ms

To make this clear, in this case the result is the same as I have just one Sandbox.

What is wrong in my first query? How should I model/query the hierarchy like that? I can save sandbox name as property in Metric node, but it seems uglier for me, however executes faster.

Upvotes: 2

Views: 124

Answers (1)

Tezra
Tezra

Reputation: 8833

Because the 2 queries are not identical.

(For reader visual difference)

MATCH (s:Sandbox {name: "sandbox"})<--(root)-[:has_metric]->(n:Metric)-[:most_recent|:prev*0..]->(v:Value) return count(v)
MATCH                                 (root)-[:has_metric]->(n:Metric)-[:most_recent|:prev*0..]->(v:Value) return count(v)

So in the second query, Neo4j doesn't care about (root). You never use root, and root is already implied by [:has_metric], so Neo4j can just skip to finding ()-[:has_metric]->(n:Metric)-[:most_recent|prev]. In the first query, now we also have to find these Sandbox nodes! And on top of that, root has to be connected to that too! So Neo4j has to do extra work to prove that that is true. The extra column can also add more rows to the results being processed, which may add more validation checks on the rest of the query.

So long story short, the first query is slower because it is doing more validation work. So, the first query will be a subset of the latter.

Upvotes: 1

Related Questions