plaskowski
plaskowski

Reputation: 3

Combine independent queries with WITH

I have 2 separate operations:

MATCH ()<-[r:waitsFor]-() 
  SET r.active = true 

and

MATCH (start:Job) WHERE has(start.startedAt) OR has(start.completedAt)
  OPTIONAL MATCH (prev:Job)<-[r:waitsFor]-(start:Job) 
SET r.active = false 

They execute in about 5s for 4k nodes and 2k relations. However if do them at once using WITH

MATCH ()<-[r:waitsFor]-() 
  SET r.active = true 
WITH null AS none
MATCH (start:Job) WHERE has(start.startedAt) OR has(start.completedAt)
  OPTIONAL MATCH (prev:Job)<-[r:waitsFor]-(start:Job) 
SET r.active = false 

query does not execute in sensible time. I've read WITH docs. What do I not understand about WITH that causes such behavior?

Upvotes: 0

Views: 87

Answers (2)

jjaderberg
jjaderberg

Reputation: 9952

Don't do that. As you say, they are separate queries, so run them separately. The way you run them now, the second query is executed once for every matched result in the first. An example from a tiny ontology graph that I am working with right now with 146 nodes labeled :OwlClass. Consider the following two queries

- Query 1 -

MATCH (c:OwlClass) 
RETURN COUNT(*)

Result:

COUNT(*)
146
Returned 1 row in 102 ms

Execution plan:

==> ColumnFilter(symKeys=["  INTERNAL_AGGREGATE01b2d10b-7bf4-43b4-a67d-fbe826c7a4dd"], returnItemNames=["COUNT(*)"], _rows=1, _db_hits=0)
==> EagerAggregation(keys=[], aggregates=["(  INTERNAL_AGGREGATE01b2d10b-7bf4-43b4-a67d-fbe826c7a4dd,CountStar())"], _rows=1, _db_hits=0)
==>   NodeByLabel(label="OwlClass", identifier="c", _rows=146, _db_hits=0)

- Query 2 -

MATCH (c:OwlClass) 
WITH null as n 
MATCH (d:OwlClass) 
RETURN COUNT(*)

Result:

COUNT(*)
21316
Returned 1 row in 734 ms

Execution plan:

==> ColumnFilter(symKeys=["  INTERNAL_AGGREGATEe33d15c7-e357-4e32-9f7f-3fbc00dd58f6"], returnItemNames=["COUNT(*)"], _rows=1, _db_hits=0)
==> EagerAggregation(keys=[], aggregates=["(  INTERNAL_AGGREGATEe33d15c7-e357-4e32-9f7f-3fbc00dd58f6,CountStar())"], _rows=1, _db_hits=0)
==>   NodeByLabel(label="OwlClass", identifier="d", _rows=21316, _db_hits=0)
==>     ColumnFilter(symKeys=["c", "n"], returnItemNames=["n"], _rows=146, _db_hits=0)
==>       Extract(symKeys=["c"], exprKeys=["n"], _rows=146, _db_hits=0)
==>         NodeByLabel(label="OwlClass", identifier="c", _rows=146, _db_hits=0)

- Query 3-

MATCH (c:OwlClass)
WITH COUNT(c) as count_c
MATCH (d:OwlClass)
RETURN count_c, COUNT(d) as count_d

Result:

count_c   count_d
146       146
Returned 1 row in 153 ms

Execution plan:

==> ColumnFilter(symKeys=["count_c", "  INTERNAL_AGGREGATE17be6af7-abbb-4ef7-8250-ff050d6ba7c6"], returnItemNames=["count_c", "count_d"], _rows=1, _db_hits=0)
==> EagerAggregation(keys=["count_c"], aggregates=["(  INTERNAL_AGGREGATE17be6af7-abbb-4ef7-8250-ff050d6ba7c6,Count(d))"], _rows=1, _db_hits=0)
==>   NodeByLabel(label="OwlClass", identifier="d", _rows=146, _db_hits=0)
==>     ColumnFilter(symKeys=["  INTERNAL_AGGREGATE67abdc54-9c44-4353-bf82-e4aff895141e"], returnItemNames=["count_c"], _rows=1, _db_hits=0)
==>       EagerAggregation(keys=[], aggregates=["(  INTERNAL_AGGREGATE67abdc54-9c44-4353-bf82-e4aff895141e,Count(c))"], _rows=1, _db_hits=0)
==>         NodeByLabel(label="OwlClass", identifier="c", _rows=146, _db_hits=0)

--

Read the execution plan from bottom up. The pertinent part in this case is how the number of rows changes. Compare this rough interpretation of queries 2 and 3.

Query 2

  1. First get all nodes with label OwlClass. There are now 146 matched patterns.
  2. Continue each pattern by setting n to null.
  3. Continue each pattern by getting all nodes with label OwlClass. There are now 146*146=21316 matched patterns.
  4. Aggregate these by counting. There is now one pattern (the count). Return it.

Query 3

  1. First get all nodes with label OwlClass. There are now 146 matched patterns.
  2. Aggregate these by counting. There is now one pattern (the count).
  3. Continue each pattern by getting all nodes with label OwlClass. There are now 146 matched patterns.
  4. Aggregate these by counting. There is now one pattern (the count). Return it.

Upvotes: 1

arijeet
arijeet

Reputation: 1874

The WITH clause is used to 'carry forward' returned information to another part of the query. Using WITH null as none and moving forward to perform subsequent matches are susceptible to the creating of cartesian products within Neo4j. Depending upon the number of nodes, a cartesian product explosion could be massive and can take lots of time to go through.

So, IF the 'r' in the first query is required in the second query to narrow down the match, then you should use WITH r. Since it seems that isn't the case (because you are using WITH null as none), that may not be the solution.

Your case seems to me to be executing two separate queries in sequence. You should take a look at the UNION clause. It is used to join two queries and combine their results. The options you have are plain old UNION or UNION ALL (removes duplicates).

Upvotes: 0

Related Questions