Reputation: 580
I have modeled the following information in neo4j. I have a 100 vertices A1,A2,...A100. Each vertex sends data to 5 other random vertices. 10 different kinds of data is being sent and each is tracked with a link between the vertices. So, for the scenario here, there are A1 has 50 outbound edges with the label sends-to
. Each of those 5 send all 8 kinds of data to 4 more. These 4 send 8 kind of data to to 3 more. these 3 send 6 kind of data to 2 more. In all there are about 100 vertices and 30000 edges with the label sends_to
with a difference in properties. When I write the following query to identify if there is a connection between A1 and A94, the query takes forever and eventually failing asking to increase the dbms.memory.heap.max_size
property.
MATCH p=(a:Entity{name: 'A 0'})-[r:SENDS_TO*..]->(d:Entity{name: 'A 94'})
RETURN p;
I increased it from 1G to 4G but still facing the same problem. I have an index on the name property of the Entity
node. There may be a cycle in the graph that is causing this.
Upvotes: 2
Views: 922
Reputation: 30397
to identify if there is a connection between A1 and A94
This is key. You're not actually looking for all possible paths (that's what your MATCH is trying to do) between the two nodes, but to check if there is at least one path. So your query is doing far far far more work than it actually needs to.
There are a few ways to modify your query to do less work (but first make sure you have an index on :Entity(name)
for fast initial lookups):
MATCH p=(a:Entity{name: 'A 0'})-[r:SENDS_TO*..]->(d:Entity{name: 'A 94'})
RETURN p
LIMIT 1
MATCH (a:Entity{name: 'A 0'}), (d:Entity{name: 'A 94'})
MATCH p = shortestPath((a)-[r:SENDS_TO*..]->(d))
RETURN p
MATCH (a:Entity{name: 'A 0'}), (d:Entity{name: 'A 94'})
CALL apoc.path.spanningTree(a, {relationshipFilter:'SENDS_TO>', endNodes:[d], limit:1}) YIELD path
RETURN path
Upvotes: 0
Reputation: 66999
A MATCH
clause avoids traversing the same relationship twice, so cycles involving going over the same relationship multiple times would not be the issue.
The root of the problem is that your graph is extremely dense. You have only 100 nodes, but on average every node has 300 relationships. This means that every node has about 3 relationships to every other node. Therefore, there are likely to be an astronomical number of unique paths between any 2 nodes, and your MATCH
clause is trying to find all those paths (and keep them all in memory).
A simple way to reduce the memory requirement (and the processing time) is to put a reasonable upper bound on the path length. You will not find all possible paths, but the shorter paths are of more interest to most use cases anyway.
For example, to find paths up to length 6:
MATCH p=(a:Entity{name: 'A 0'})-[r:SENDS_TO*..6]->(d:Entity{name: 'A 94'})
RETURN p;
Upvotes: 3