Biuni
Biuni

Reputation: 23

How to create a graph from a list with Spark Graphx

I have a list in scala, like this:

  val log = List(
    List("a","b","c"),
    List("a","c","b","h","c"),
    List("a","d","e"),
    List("a","d","e","f","d","e")
  )

and i want to create a graph like this:

graph
with a method that create this two arrays:

  val vertexName: RDD[(VertexId, (String))] =
    sc.parallelize(Array((1L, ("a")), (2L, ("b")),
                         (3L, ("c")), (4L, ("d")),
                         (5L, ("e")), (6L, ("f")),
                         (7L, ("h"))))

  val edgeName: RDD[Edge[String]] =
    sc.parallelize(Array(Edge(1L, 2L, "1"), Edge(2L, 3L, "1"),
                         Edge(1L, 3L, "1"), Edge(3L, 2L, "1"),
                         Edge(2L, 7L, "1"), Edge(7L, 3L, "1"),
                         Edge(1L, 4L, "1"), Edge(4L, 5L, "1"),
                         Edge(5L, 6L, "1"), Edge(6L, 4L, "1")))

  val graph = Graph(vertexName, edgeName)

It's possible? There's a way?

Upvotes: 1

Views: 1704

Answers (1)

Oli
Oli

Reputation: 10406

I am assuming that your list of vertices are paths that should be found within the graph.

I would start by building a mapping between vertex names and their VertexId

val vertices = log.flatMap(x=> x).toSet.toSeq
val vertexMap = (0 until vertices.size)
    .map(i => vertices(i) -> i.toLong)
    .toMap

Then I would generate the set of edges (to avoid duplicates) using the vertex map.

val edgeSet = log
    .filter(_.size >1) // with only one vertex, this is not a path
    .flatMap(list => list.indices.tail.map( i => list(i-1) -> list(i)))
    .map(x => Edge(vertexMap(x._1), vertexMap(x._2), "1"))
    .toSet

And creating the graph:

val edges = sc.parallelize(edgeSet.toSeq)
val vertexNames = sc.parallelize(vertexMap.toSeq.map(_.swap))
val graph = Graph(vertexNames, edges)

Upvotes: 1

Related Questions