Reputation: 3015
I have a dataframe dfMaster which has three columns, vertex1, vertex2, weight. I'm trying to create a GraphX directed weighted graph which has vertexes from V1 and V2 and edges between them with their corresponding weight. I can create the edge and vertex df's by doing:
val edgeDF = dfMaster.select($"vertex1", $"vertex2", $"weight").distinct()
val vertexDF = (dfMaster.select("vertex1").toDF().unionAll(DFMaster.select("vertex2").toDF())).distinct()
How do I then load this into a weighted graph? Thanks for the help.
Upvotes: 3
Views: 2226
Reputation: 17872
As far as I know, Spark GraphX currently supports only creation from RDDs. The main methods available for graph creation can be found at the following classes:
For your case, I suggest the following code:
import org.apache.spark.sql.Row
import org.apache.spark.graphx.{Graph, Edge}
val edgeDF = dfMaster.select($"vertex1", $"vertex2", $"weight").distinct()
val edgeRDD = edgeDF.map {
case Row(srcId: Double, dstId: Double, wgt: Double) => Edge[Double](srcId.toLong, dstId.toLong, wgt)
}
val graph = Graph.fromEdges[Int, Double](edgesRDD, 0)
The fromEdges
method above infers the vertices from the edges and sets 0
as their attribute.
Assumptions:
vertex1
, vertex2
and weight
are columns of Double
;0
.Upvotes: 4