Reputation: 873
I am new to Spark and GraphX. So far I have been using Titan DB (HBase storage) and Giraph for processing. I have a requirement to have a graph with ~3 Billion Vertices and ~5 billion Edges. What would be the best way to store the graph(create the graph from scratch by adding vertices and edges, Also I want to move away from titan API for graph creation). I am not able to find any direct documentation around this. Can you suggest me what would be the best way to create/store my graph and process using GraphX, with commodity hardware?
Thanks.
Upvotes: 3
Views: 830
Reputation: 201
As long as you can read HBase Tables into RDD (which you can), there should be no issue. Check out the HBaseTest Example (it's in the Spark distribution) will probably help you further.
Upvotes: 2