Reputation: 165
I am checking the feasibility of Exporting Spark GraphX graph to Titan graph database.
***Used below code to construct graph in Spark GraphX and writing graph to a json file :***
val conf = new SparkConf()
val sc = new SparkContext(conf.setAppName("========= GraphXTest ======="))
// Create an RDD for the vertices
val users: RDD[(VertexId, (String, String))] = sc.parallelize(Array(
(3L, ("rxin", "student")),
(7L, ("jgonzal", "postdoc")),
(5L, ("franklin", "prof"))
))
// Create an RDD for edges
val relationships: RDD[Edge[String]] = sc.parallelize(Array(
Edge(3L, 7L, "collab"),
Edge(5L, 3L, "advisor")
))
// Build the initial Graph
val graph = Graph(users, relationships)
graph.vertices.saveAsTextFile("D://Spark-GraphX-vertices.json")
While running the above code it creates folder with the name that I mentioned D://Spark-GraphX-vertices.json and few other files inside that. But those files does not contain any data.
How to export this Graph from Spark GraphX to Titan Database ??
Upvotes: 1
Views: 891
Reputation: 1702
You need to get your data into an adjacency list format for Titan to be able to read it in. Your best bet will be to export to a text file and use ScriptInputFormat to read it. For instance:
1:2,4,5,6
2:4,1,5
3:7,8,9,2
This format says that vertex 1 is connected to 2, 4, 5, and 6. If your data set is small (< 100 million edges), then just for-loop through your file and use the OLTP API to write the data (and you don't really need it in adjacency list format, though that will be good as at least one of your vertices will be in cache). If your data set is large (billions of edges), you will need to use BulkLoaderVertexProgram and bulk load it into Titan. Here are some links to study:
http://tinkerpop.apache.org/docs/3.1.0-incubating/#bulkloadervertexprogram
http://tinkerpop.apache.org/docs/3.1.0-incubating/#sparkgraphcomputer (interestingly, you will use Spark to bulk load your graph)
http://tinkerpop.apache.org/docs/3.1.0-incubating/#script-io-format
Upvotes: 4