Jayaprakash Narayanan
Jayaprakash Narayanan

Reputation: 165

can we export Spark GraphX graph data to Titan Graph Database?

I am checking the feasibility of Exporting Spark GraphX graph to Titan graph database.

***Used below code to construct graph in Spark GraphX and writing graph to a json file :***                     


    val conf = new SparkConf()
    val sc = new SparkContext(conf.setAppName("========= GraphXTest ======="))

    // Create an RDD for the vertices
    val users: RDD[(VertexId, (String, String))] = sc.parallelize(Array(
                                                    (3L, ("rxin", "student")),
                                                    (7L, ("jgonzal", "postdoc")),
                                                    (5L, ("franklin", "prof"))
                                                    ))
      // Create an RDD for edges                                                
     val relationships: RDD[Edge[String]] = sc.parallelize(Array(
                                                Edge(3L, 7L, "collab"),
                                                Edge(5L, 3L, "advisor")
                                                ))
     // Build the initial Graph                                         
     val graph = Graph(users, relationships)
     graph.vertices.saveAsTextFile("D://Spark-GraphX-vertices.json")

While running the above code it creates folder with the name that I mentioned D://Spark-GraphX-vertices.json and few other files inside that. But those files does not contain any data.

How to export this Graph from Spark GraphX to Titan Database ??

Upvotes: 1

Views: 891

Answers (1)

Marko A. Rodriguez
Marko A. Rodriguez

Reputation: 1702

You need to get your data into an adjacency list format for Titan to be able to read it in. Your best bet will be to export to a text file and use ScriptInputFormat to read it. For instance:

1:2,4,5,6
2:4,1,5
3:7,8,9,2

This format says that vertex 1 is connected to 2, 4, 5, and 6. If your data set is small (< 100 million edges), then just for-loop through your file and use the OLTP API to write the data (and you don't really need it in adjacency list format, though that will be good as at least one of your vertices will be in cache). If your data set is large (billions of edges), you will need to use BulkLoaderVertexProgram and bulk load it into Titan. Here are some links to study:

http://tinkerpop.apache.org/docs/3.1.0-incubating/#bulkloadervertexprogram

http://tinkerpop.apache.org/docs/3.1.0-incubating/#sparkgraphcomputer (interestingly, you will use Spark to bulk load your graph)

http://tinkerpop.apache.org/docs/3.1.0-incubating/#script-io-format

Upvotes: 4

Related Questions