Reputation: 3
I have 2 separate operations:
MATCH ()<-[r:waitsFor]-()
SET r.active = true
and
MATCH (start:Job) WHERE has(start.startedAt) OR has(start.completedAt)
OPTIONAL MATCH (prev:Job)<-[r:waitsFor]-(start:Job)
SET r.active = false
They execute in about 5s for 4k nodes and 2k relations. However if do them at once using WITH
MATCH ()<-[r:waitsFor]-()
SET r.active = true
WITH null AS none
MATCH (start:Job) WHERE has(start.startedAt) OR has(start.completedAt)
OPTIONAL MATCH (prev:Job)<-[r:waitsFor]-(start:Job)
SET r.active = false
query does not execute in sensible time. I've read WITH docs. What do I not understand about WITH that causes such behavior?
Upvotes: 0
Views: 87
Reputation: 9952
Don't do that. As you say, they are separate queries, so run them separately. The way you run them now, the second query is executed once for every matched result in the first. An example from a tiny ontology graph that I am working with right now with 146 nodes labeled :OwlClass
. Consider the following two queries
- Query 1 -
MATCH (c:OwlClass)
RETURN COUNT(*)
Result:
COUNT(*)
146
Returned 1 row in 102 ms
Execution plan:
==> ColumnFilter(symKeys=[" INTERNAL_AGGREGATE01b2d10b-7bf4-43b4-a67d-fbe826c7a4dd"], returnItemNames=["COUNT(*)"], _rows=1, _db_hits=0)
==> EagerAggregation(keys=[], aggregates=["( INTERNAL_AGGREGATE01b2d10b-7bf4-43b4-a67d-fbe826c7a4dd,CountStar())"], _rows=1, _db_hits=0)
==> NodeByLabel(label="OwlClass", identifier="c", _rows=146, _db_hits=0)
- Query 2 -
MATCH (c:OwlClass)
WITH null as n
MATCH (d:OwlClass)
RETURN COUNT(*)
Result:
COUNT(*)
21316
Returned 1 row in 734 ms
Execution plan:
==> ColumnFilter(symKeys=[" INTERNAL_AGGREGATEe33d15c7-e357-4e32-9f7f-3fbc00dd58f6"], returnItemNames=["COUNT(*)"], _rows=1, _db_hits=0)
==> EagerAggregation(keys=[], aggregates=["( INTERNAL_AGGREGATEe33d15c7-e357-4e32-9f7f-3fbc00dd58f6,CountStar())"], _rows=1, _db_hits=0)
==> NodeByLabel(label="OwlClass", identifier="d", _rows=21316, _db_hits=0)
==> ColumnFilter(symKeys=["c", "n"], returnItemNames=["n"], _rows=146, _db_hits=0)
==> Extract(symKeys=["c"], exprKeys=["n"], _rows=146, _db_hits=0)
==> NodeByLabel(label="OwlClass", identifier="c", _rows=146, _db_hits=0)
- Query 3-
MATCH (c:OwlClass)
WITH COUNT(c) as count_c
MATCH (d:OwlClass)
RETURN count_c, COUNT(d) as count_d
Result:
count_c count_d
146 146
Returned 1 row in 153 ms
Execution plan:
==> ColumnFilter(symKeys=["count_c", " INTERNAL_AGGREGATE17be6af7-abbb-4ef7-8250-ff050d6ba7c6"], returnItemNames=["count_c", "count_d"], _rows=1, _db_hits=0)
==> EagerAggregation(keys=["count_c"], aggregates=["( INTERNAL_AGGREGATE17be6af7-abbb-4ef7-8250-ff050d6ba7c6,Count(d))"], _rows=1, _db_hits=0)
==> NodeByLabel(label="OwlClass", identifier="d", _rows=146, _db_hits=0)
==> ColumnFilter(symKeys=[" INTERNAL_AGGREGATE67abdc54-9c44-4353-bf82-e4aff895141e"], returnItemNames=["count_c"], _rows=1, _db_hits=0)
==> EagerAggregation(keys=[], aggregates=["( INTERNAL_AGGREGATE67abdc54-9c44-4353-bf82-e4aff895141e,Count(c))"], _rows=1, _db_hits=0)
==> NodeByLabel(label="OwlClass", identifier="c", _rows=146, _db_hits=0)
--
Read the execution plan from bottom up. The pertinent part in this case is how the number of rows changes. Compare this rough interpretation of queries 2 and 3.
Query 2
OwlClass
. There are now 146 matched patterns. OwlClass
. There are now 146*146=21316 matched patterns. Query 3
OwlClass
. There are now 146 matched patterns. OwlClass
. There are now 146 matched patterns.Upvotes: 1
Reputation: 1874
The WITH clause is used to 'carry forward' returned information to another part of the query. Using WITH null as none and moving forward to perform subsequent matches are susceptible to the creating of cartesian products within Neo4j. Depending upon the number of nodes, a cartesian product explosion could be massive and can take lots of time to go through.
So, IF the 'r' in the first query is required in the second query to narrow down the match, then you should use WITH r. Since it seems that isn't the case (because you are using WITH null as none), that may not be the solution.
Your case seems to me to be executing two separate queries in sequence. You should take a look at the UNION clause. It is used to join two queries and combine their results. The options you have are plain old UNION
or UNION ALL
(removes duplicates).
Upvotes: 0