Reputation: 4513
I'm trying to parallelize a query to count vertices that have an edge of a given label in AWS Neptune on a large Graph by partitioning vertices by ID:
g.V().hasLabel('Label').
hasId(startingWith('prefix')).
where(outE('EdgeType')).
count()
however, the query seems to consume a lot of memory and I run into OOM exceptions. Is there an explanation for this? And, what would in general be a good strategy to parallelize/run such a query efficiently?
The graph has about ~500M vertices where ~100M have the label of interest and ~90% of those have an edge with the desired label.
Upvotes: 0
Views: 884
Reputation: 2759
Any of the text predicates (startingwith()
, endingWith()
, containing()
) are non-indexed operations in Neptune as Neptune does not maintain a native full-text-search index. This means that any query using those may need to perform a full or range "scan" of the graph to find the results and this can expensive (as Kelvin mentions). You can, however, leverage integration with OpenSearch [1] if these types of queries are common in your use case.
Also note that Neptune's execution model is presently designed for highly-concurrent, transactional queries. If you have queries that are going to touch a large portion of the dataset, you may need to break those queries up into multiple, parallel queries. Each Neptune instance has a number of query execution threads equal to 2x the number of vCPUs on an instance. Memory allocation is divided up such that a large portion of instance memory is reserved for buffer pool cache and the remaining memory is allocated for the OS of the instance and the execution threads. Each execution thread will use a portion of the memory allocated for those threads. So an Out-Of-Memory Exception occurs when the thread runs out of memory, not the instance running out of memory. If the query execution is running out of memory, you can try increasing the instance size (to allocate more memory to the threads), or you may need to divide the query into multiple parts and execute them concurrently.
[1] https://docs.aws.amazon.com/neptune/latest/userguide/full-text-search.html
Upvotes: 1