samol
samol

Reputation: 20580

How to join in spark graphx given multiple vertex types

I am relatively new to spark graphx. Basically my graph has:

  1. 2 vertex types: person and car
  2. edge describes which person owns which car

I want to given all the person vertices in the graph, traverse the edges to collect a list of cars for each person

e.g.

person1 -> [car1, car2]
person2 -> [car3]

Upvotes: 1

Views: 1093

Answers (1)

ulrich
ulrich

Reputation: 3587

You can achieve this with a bit of SQL.

Let's assume that you have the following graph:

import org.apache.spark.graphx
import org.apache.spark.rdd.RDD

// Create an RDD for the vertices
val v: RDD[(VertexId, (String))] =
  sc.parallelize(Array((1L, ("car1")), (2L, ("car2")),
                       (3L, ("car3")), (4L, ("person1")),(5L, ("person2"))))
// Create an RDD for edges
val e: RDD[Edge[Int]] =
  sc.parallelize(Array(Edge(4L, 1L,1),    Edge(4L, 2L, 1),
                       Edge(5L, 1L,1)))


val graph = Graph(v,e)

Now extract the edges and vertices into Dataframes:

val vDf = graph.vertices.toDF("vId","vName")
val eDf =graph.edges.toDF("person","car","attr")

Transform the data into the desired output

eDf.drop("attr").join(vDf,'person === 'vId).drop("vId","person").withColumnRenamed("vName","person")
.join(vDf,'car === 'vId).drop("car","vId")
.groupBy("person")
.agg(collect_set('vName)).toDF("person","car")
.show()


+-------+------------+
| person|         car|
+-------+------------+
|person2|      [car1]|
|person1|[car2, car1]|
+-------+------------+

Upvotes: 1

Related Questions