Ashok Krishnamoorthy
Ashok Krishnamoorthy

Reputation: 873

GraphX - Best way to store and compute over 3 billion vertices

I am new to Spark and GraphX. So far I have been using Titan DB (HBase storage) and Giraph for processing. I have a requirement to have a graph with ~3 Billion Vertices and ~5 billion Edges. What would be the best way to store the graph(create the graph from scratch by adding vertices and edges, Also I want to move away from titan API for graph creation). I am not able to find any direct documentation around this. Can you suggest me what would be the best way to create/store my graph and process using GraphX, with commodity hardware?

Thanks.

Upvotes: 3

Views: 830

Answers (1)

Sietse
Sietse

Reputation: 201

As long as you can read HBase Tables into RDD (which you can), there should be no issue. Check out the HBaseTest Example (it's in the Spark distribution) will probably help you further.

Upvotes: 2

Related Questions