firefly2442
firefly2442

Reputation: 557

Improve Neo4j Cypher Performance On Lengthy Match

Setup:

I have the following cypher query that I would like to improve the performance on:

START a=node(2) MATCH (a)-[:knowledge]-(x)-[:depends]-(y)-[:knowledge]-(end) RETURN COUNT(DISTINCT end);

This returns 471 (188171 ms).

Right now I'm only getting a count but later I may want to get the values (471 in this example). The problem is it takes about 3-4 minutes to run.

The graph is highly connected with many relationships. Running the following shows how many edges of type "knowledge" exist for node a(2).

START a=node(2) MATCH (a)-[:knowledge]-(x) RETURN COUNT(a);

This returns 4350 (103 ms).

To me, this doesn't seem like many edges to check. Can I split this up somehow to improve performance?

edit: As per the comments, here are the results from running the query with profile:

profile START a=node(2) MATCH (a)-[:knowledge]-(x)-[:depends]-(y)-[:knowledge]-(end) RETURN COUNT(DISTINCT end);
==> +---------------------+
==> | COUNT(DISTINCT end) |
==> +---------------------+
==> | 471                 |
==> +---------------------+
==> 1 row
==> 
==> ColumnFilter(symKeys=["  INTERNAL_AGGREGATEcd2aff18-1c9d-47a8-9217-588cb86bbc1a"], returnItemNames=["COUNT(DISTINCT end)"], _rows=1, _db_hits=0)
==> EagerAggregation(keys=[], aggregates=["(  INTERNAL_AGGREGATEcd2aff18-1c9d-47a8-9217-588cb86bbc1a,Distinct)"], _rows=1, _db_hits=0)
==>   TraversalMatcher(trail="(a)-[  UNNAMED7:knowledge WHERE true AND true]-(x)-[  UNNAMED8:depends WHERE true AND true]-(y)-[  UNNAMED9:knowledge WHERE true AND true]-(end)", _rows=25638262, _db_hits=25679365)
==>     ParameterPipe(_rows=1, _db_hits=0)

Upvotes: 0

Views: 484

Answers (1)

firefly2442
firefly2442

Reputation: 557

I ended up doing the following to improve performance:

profile START a=node(2) MATCH (a)-[:knowledge]-(x) WITH DISTINCT x MATCH (x)-[:depends]-(y) WITH DISTINCT y MATCH (y)-[:knowledge]-(end) WITH DISTINCT end RETURN COUNT(end);
==> +------------+
==> | COUNT(end) |
==> +------------+
==> | 471        |
==> +------------+
==> 1 row
==> 
==> ColumnFilter(symKeys=["  INTERNAL_AGGREGATE1967576a-d357-457a-b799-adbb16b93048"], returnItemNames=["COUNT(end)"], _rows=1, _db_hits=0)
==> EagerAggregation(keys=[], aggregates=["(  INTERNAL_AGGREGATE1967576a-d357-457a-b799-adbb16b93048,Count)"], _rows=1, _db_hits=0)
==>   Distinct(_rows=471, _db_hits=0)
==>     PatternMatch(g="(end)-['  UNNAMED3']-(y)", _rows=403437, _db_hits=0)
==>       Distinct(_rows=735, _db_hits=0)
==>         PatternMatch(g="(x)-['  UNNAMED2']-(y)", _rows=1653, _db_hits=0)
==>           Distinct(_rows=177, _db_hits=0)
==>             TraversalMatcher(trail="(a)-[  UNNAMED1:knowledge WHERE true AND true]-(x)", _rows=4350, _db_hits=4351)
==>               ParameterPipe(_rows=1, _db_hits=0)

By making each step a small part in the overall, it reduces the overall complexity and only follows edges that will match.

Upvotes: 2

Related Questions