Reputation: 137
In an AWS Neptune graph with billions of nodes and edges, how would one go about finding the largest connected components efficiently? The reason I am trying to find the answer to this question is because usually large connected components in my domain indicate fraud. Most nodes in my graph only are connected to like tens of other nodes. It is suspicious when nodes are connected to hundreds or thousands of other nodes.
I have several questions:
Any help is much appreciated!
Upvotes: 2
Views: 828
Reputation: 14371
Connected component finding queries can be expressed in Gremlin, but whether or not those queries will be efficient, is going to depend on the complexity of the graph. I would start by looking at the Gremlin Recipes document.
You will find several algorithms discussed there.
At very large scale, you may want to export data from the graph and run a Spark job (or similar) to find the fraud rings etc.
UPDATE 2024-01-29
In December 2023, Neptune Analytics was released. It includes support for built in algorithms, including many that can be used for community detection use cases. The documentation for the algorithms is here
Upvotes: 2