mt88
mt88

Reputation: 3015

Spark Scala GraphX: Creating a Weighted Directed Graph

I have a dataframe dfMaster which has three columns, vertex1, vertex2, weight. I'm trying to create a GraphX directed weighted graph which has vertexes from V1 and V2 and edges between them with their corresponding weight. I can create the edge and vertex df's by doing:

val edgeDF = dfMaster.select($"vertex1", $"vertex2", $"weight").distinct()
val vertexDF = (dfMaster.select("vertex1").toDF().unionAll(DFMaster.select("vertex2").toDF())).distinct()

How do I then load this into a weighted graph? Thanks for the help.

Upvotes: 3

Views: 2226

Answers (1)

Daniel de Paula
Daniel de Paula

Reputation: 17872

As far as I know, Spark GraphX currently supports only creation from RDDs. The main methods available for graph creation can be found at the following classes:

For your case, I suggest the following code:

import org.apache.spark.sql.Row
import org.apache.spark.graphx.{Graph, Edge}

val edgeDF = dfMaster.select($"vertex1", $"vertex2", $"weight").distinct()

val edgeRDD = edgeDF.map { 
  case Row(srcId: Double, dstId: Double, wgt: Double) => Edge[Double](srcId.toLong, dstId.toLong, wgt)
}

val graph = Graph.fromEdges[Int, Double](edgesRDD, 0)   

The fromEdges method above infers the vertices from the edges and sets 0 as their attribute.

Assumptions:

  • vertex1, vertex2 and weight are columns of Double;
  • There is no attribute information for vertices, so it's ok if all of them are created with 0.

Upvotes: 4

Related Questions