gannina
gannina

Reputation: 183

Merge multiple graphs together in GraphX

Hi I have built multiple multiple graphs (11 in total)

ex: Graph 1 - SongArtist - SongVertex (Id, SongName) ArtistVertex(Id, ArtistName, NetWorth) Edge(Song, Artist, "Sung")

Graph 2 - SongWriter - SongVertex (Id, SongName) WriterVertex(Id, ArtistName) Edge(Song, Writer, "WrittenBy")

Graph 3 - ArtistWriter- ArtistVertex(Id, ArtistName, NetWorth) WriterVertex(Id, ArtistName) Edge(Artist, Writer, "Collaborated") ...

I want to be able to merge all of them together to form one graph. Graph1 and Graph2 can be merged on Song and Graph2 and Graph3 can be merged on Writer and Graph1 and Graph3 can be merged on Artist.

Some graphs have edge properties and vertices properties defined by a case class. The following shows how Graph3 was developed. The others follow more or less the same structure ex:

case class ArtistWriterProperties(weight: String, edgeType: String) extends EdgeProperty
case class ArtistProperty(val vertexType: String, val artistName: String, val netWorth: String) extends VertexProperty
case class WriterProperty(val vertexType: String, val writerName: String) extends VertexProperty

val ArtistWriter: RDD[(VertexId, VertexProperty)] = sc.textFile(vertexArtistWriter).map {
  line =>
    val row = line.split(",")
    val id = row(0).toLong
    val vertexType = row(1)
    val prop = vertexType match {
      case "Artist" => ArtistProperty(vertexType, row(2), row(3))
      case "Writer" => WriterProperty(vertexType, row(2))
    }
    (id, prop)
}

val edgesArtistWriterCollaborated: RDD[Edge[EdgeProperty]] = sc.textFile(edgeWeightedArtistWriterCollaborated).map {
  line =>
    val row = line.split(",")
    Edge(row(0).toLong, row(1).toLong, ArtistWriterProperties(row(2), row(3)))
}

val graph3 = Graph(ArtistWriter, edgesArtistWriterCollaborated)

I was trying something of this sort:

val graph2And3 = Graph(
  graph2.vertices.union(graph3.vertices),
  graph2.edges.union(graph3.edges)
).partitionBy(RandomVertexCut).
  groupEdges( (attr1, attr2) => attr1 + attr2 )

But I am getting errors - type mismatch

Upvotes: 0

Views: 810

Answers (1)

Vladislav Varslavans
Vladislav Varslavans

Reputation: 2934

So basically you need to perform join for vertexes and union for edges.

For each graph you can get RDD of vertexes and RDD of edges.

1) Sequentially full outer join RDDs of vertexes by required keys and create new IDs for final vertexes e.g. graph1.vertexes.fullOuterJoin(graph2.vertexes, "SongArtist").fullOuterJoin...

2) Union all RDDs of edges and then you can create Graph from new RDDs of vertexes and RDDs of edges.

Upvotes: 1

Related Questions