Multiple Match with Cypher is super slow

Question

I started with Cypher yesterday and went into a problem. My hierarchy structure looks like:

ReleaseCycle->Line->ProductSet

So "ReleaseCycle" are the fewest elements and in the hierarchy tree at the very top. The children from "ReleaseCycle" are "Line" and the children from "Line" are "ProductSet" which includes the most Data. All of them are related to "releaseTask" and i want to return all the releaseTask.

My cypher call at the moment looks like:

MATCH (a1:Line)<-[:contains]-(b1:ReleaseCycle{id:'xyz'})<-[c1:releaseTask]-(d1:Task)      
MATCH (a2:ProductSet)<-[:contains]-(b2:Line{id: a1.id})<-[c2:releaseTask]-(d2:Task)  
MATCH (b3:ProductSet{id: a2.id})<-[c3:releaseTask]-(d3:Task)

RETURN COLLECT(DISTINCT {a:"cycle", task: d1, contributor: c1.contributor}),                             
COLLECT(DISTINCT {a:"Line", task: d2, contributor: c2.contributor}),
COLLECT(DISTINCT {a:"ps", task: d3, contributor: c3.contributor})

The problem is that it takes 5 seconds for the respond and it seems like there are entries missing. My result is only 800 but it should be 3000... Maybe because of the "Distinct" and Collect() but if i don't use them I get a timeout because the query takes too long.

Thanks for your help

InverseFalcon · Accepted Answer

If you PROFILE the query, you'll see that the number of records/rows being generated as the query executes is likely spiking. Cypher operations execute per record/row, so you're performing much more work than is needed.

The key with aggregation is to collect as early as possible (where it makes sense), and this will in turn reduce cardinality (the records/rows in the query) which means fewer operations needing execution and fewer redundant operations on the same nodes/data.

In your case, it makes sense to do this in steps, and to use pattern comprehension as shorthand to match a pattern and collect results.

You can also reuse variables throughout your query instead of having to rematch to the nodes you already have (though WITH allows you to redefine what variables are in scope).

Try this one out, and if it gives good results, PROFILE it to see the difference:

MATCH (release:ReleaseCycle{id:'xyz'})
WITH release, [(release)<-[rel:releaseTask]-(task:Task) | 
 {a:"cycle", task:task, contributor:rel.contributor}] as cycleTasks   
MATCH (line:Line)<-[:contains]-(release)
WITH cycleTasks, line, [(line)<-[rel:releaseTask]-(task:Task) | 
 {a:"Line", task:task, contributor:rel.contributor}] as lineTasks
MATCH (prod:ProductSet)<-[:contains]-(line)
WITH cycleTasks, lineTasks, [(prod)<-[rel:releaseTask]-(task:Task) | 
 {a:"ps", task:task, contributor:rel.contributor}] as psTasks
RETURN cycleTasks, lineTasks, psTasks

Also, make sure you have indexes on the label/property combinations that you plan to use when looking up your starting node/nodes.

Multiple Match with Cypher is super slow

Answers (1)

Related Questions