Reputation: 123
I have a large graph (1,068,029 nodes and 2,602,897 relationships), and I work with it via the python API and make requests to the graph in my program flow.
I have the following queries -
MATCH
(start_node)--(o:observed_data)--(i:indicator)--(m:malware)--(end_node:attack_pattern)
WHERE start_node.id in [id_list]
RETURN start_node.id, end_node.name
MATCH
(start_node)--(o1:observed_data)--(h:MD5)--(o2:observed_data)--(i:indicator)--(m:malware)--(end_node:attack_pattern)
WHERE start_node.id in [id_list]
RETURN start_node.id, end_node.name
When I trying to preform the first query with id_list of size 75,000 its passes OK and returns the wanted output, but when I trying to preform the second query - the graph gets stuck, even when I decreasing the id_list to 20,000.
The id_list is even larger than 75,000 but I split it into chunks in order to make the graph's response time faster, but if I will split it to too many chunks I will increase the number of requests to the graph, and increase the program run-time.
My question is - Is there a library's function of some sort (APOC or something like that) that performs the same action but in less time? Or maybe you have another solution that solves this problem without decreasing the id_list under 50,000?
Upvotes: 0
Views: 317
Reputation: 67044
(start_node)
in your MATCH
patterns should specify a label (like (start_node:Foo)
), to avoid having to scan every
node in the DB. Also, you should create an index (or uniqueness constraint) for that start node.MATCH
patterns directional
, if appropriate. That is, put an arrow on either end.()-[:BAR]->()
), so that the query would not be forced to evaluate all relationship types.Upvotes: 1