neo4j query to find indirect paths runs extremely slow

Question

MATCH (a:Author {id:'author_1'}),
(art:Article {id:'PMID:21473878'})
WITH a, art
MATCH r=((a)-[*2..4]-(art))
RETURN r

In a database with roughly 1.3 million nodes and 8 million relations this query runs forever. Is there anything I can do?

There are indexes on :Author and :Article id

===============

InverseFalcon · Accepted Answer

In this particular case, the query planner can sometimes use an inefficient approach when matching to patterns connecting two nodes you already know.

In this case, the planner takes one node as the start node, expands to all possible nodes from the given pattern, and then applies a filter on all of those nodes to see if it's the other node at the end of the match. This is unnecessary property access, especially with large numbers of matched nodes.

The better approach is for both your start and end node to be looked up via the index, then perform expansions from one of those nodes, and use a hash join to determine which of those end nodes is the same as the end node you're looking for. This approach only uses property access once when matching to the id of the end node in question (instead of for every single node found at the end of the expansion).

The trick right now is how to get Neo4j to use this approach in the planner. This may work:

MATCH (a:Author {id:'author_1'}),
(art:Article {id:'PMID:21473878'})
MATCH r=((a)-[*2..4]-(end))
WHERE end = art
RETURN r

At the least, I'd expect this to be about as fast as your approach using an OPTIONAL MATCH.

neo4j query to find indirect paths runs extremely slow

Answers (2)

Related Questions