Reputation: 580

neo4j finding all the connections very slow

I have modeled the following information in neo4j. I have a 100 vertices A1,A2,...A100. Each vertex sends data to 5 other random vertices. 10 different kinds of data is being sent and each is tracked with a link between the vertices. So, for the scenario here, there are A1 has 50 outbound edges with the label sends-to. Each of those 5 send all 8 kinds of data to 4 more. These 4 send 8 kind of data to to 3 more. these 3 send 6 kind of data to 2 more. In all there are about 100 vertices and 30000 edges with the label sends_to with a difference in properties. When I write the following query to identify if there is a connection between A1 and A94, the query takes forever and eventually failing asking to increase the dbms.memory.heap.max_size property.

MATCH p=(a:Entity{name: 'A 0'})-[r:SENDS_TO*..]->(d:Entity{name: 'A 94'}) RETURN p;

I increased it from 1G to 4G but still facing the same problem. I have an index on the name property of the Entity node. There may be a cycle in the graph that is causing this.

Upvotes: 2

Answers (2)

InverseFalcon

Reputation: 30397

to identify if there is a connection between A1 and A94

This is key. You're not actually looking for all possible paths (that's what your MATCH is trying to do) between the two nodes, but to check if there is at least one path. So your query is doing far far far more work than it actually needs to.

There are a few ways to modify your query to do less work (but first make sure you have an index on :Entity(name) for fast initial lookups):

LIMIT your results to 1, which will cause the query to stop once the first match is found:

MATCH p=(a:Entity{name: 'A 0'})-[r:SENDS_TO*..]->(d:Entity{name: 'A 94'})
RETURN p
LIMIT 1

Use a shortestPath() MATCH function to perform a bi-directional breadth-first expansion until a shortest path is found between the two nodes:

MATCH (a:Entity{name: 'A 0'}), (d:Entity{name: 'A 94'})
MATCH p = shortestPath((a)-[r:SENDS_TO*..]->(d))
RETURN p

Use path expander procedures from APOC Procedures, specifying an end node and limiting results to 1:

 MATCH (a:Entity{name: 'A 0'}), (d:Entity{name: 'A 94'})
 CALL apoc.path.spanningTree(a, {relationshipFilter:'SENDS_TO>', endNodes:[d], limit:1}) YIELD path
 RETURN path

Upvotes: 0

cybersam

Reputation: 66999

A MATCH clause avoids traversing the same relationship twice, so cycles involving going over the same relationship multiple times would not be the issue.

The root of the problem is that your graph is extremely dense. You have only 100 nodes, but on average every node has 300 relationships. This means that every node has about 3 relationships to every other node. Therefore, there are likely to be an astronomical number of unique paths between any 2 nodes, and your MATCH clause is trying to find all those paths (and keep them all in memory).

A simple way to reduce the memory requirement (and the processing time) is to put a reasonable upper bound on the path length. You will not find all possible paths, but the shorter paths are of more interest to most use cases anyway.

For example, to find paths up to length 6:

MATCH p=(a:Entity{name: 'A 0'})-[r:SENDS_TO*..6]->(d:Entity{name: 'A 94'})
RETURN p;

Upvotes: 3

neo4j finding all the connections very slow

Answers (2)

Related Questions