Reputation: 183
Hi I have built multiple multiple graphs (11 in total)
ex: Graph 1 - SongArtist - SongVertex (Id, SongName) ArtistVertex(Id, ArtistName, NetWorth) Edge(Song, Artist, "Sung")
Graph 2 - SongWriter - SongVertex (Id, SongName) WriterVertex(Id, ArtistName) Edge(Song, Writer, "WrittenBy")
Graph 3 - ArtistWriter- ArtistVertex(Id, ArtistName, NetWorth) WriterVertex(Id, ArtistName) Edge(Artist, Writer, "Collaborated") ...
I want to be able to merge all of them together to form one graph. Graph1 and Graph2 can be merged on Song and Graph2 and Graph3 can be merged on Writer and Graph1 and Graph3 can be merged on Artist.
Some graphs have edge properties and vertices properties defined by a case class. The following shows how Graph3 was developed. The others follow more or less the same structure ex:
case class ArtistWriterProperties(weight: String, edgeType: String) extends EdgeProperty
case class ArtistProperty(val vertexType: String, val artistName: String, val netWorth: String) extends VertexProperty
case class WriterProperty(val vertexType: String, val writerName: String) extends VertexProperty
val ArtistWriter: RDD[(VertexId, VertexProperty)] = sc.textFile(vertexArtistWriter).map {
line =>
val row = line.split(",")
val id = row(0).toLong
val vertexType = row(1)
val prop = vertexType match {
case "Artist" => ArtistProperty(vertexType, row(2), row(3))
case "Writer" => WriterProperty(vertexType, row(2))
}
(id, prop)
}
val edgesArtistWriterCollaborated: RDD[Edge[EdgeProperty]] = sc.textFile(edgeWeightedArtistWriterCollaborated).map {
line =>
val row = line.split(",")
Edge(row(0).toLong, row(1).toLong, ArtistWriterProperties(row(2), row(3)))
}
val graph3 = Graph(ArtistWriter, edgesArtistWriterCollaborated)
I was trying something of this sort:
val graph2And3 = Graph(
graph2.vertices.union(graph3.vertices),
graph2.edges.union(graph3.edges)
).partitionBy(RandomVertexCut).
groupEdges( (attr1, attr2) => attr1 + attr2 )
But I am getting errors - type mismatch
Upvotes: 0
Views: 810
Reputation: 2934
So basically you need to perform join
for vertexes and union
for edges.
For each graph you can get RDD of vertexes and RDD of edges.
1) Sequentially full outer join
RDDs of vertexes by required keys and create new IDs for final vertexes e.g. graph1.vertexes.fullOuterJoin(graph2.vertexes, "SongArtist").fullOuterJoin...
2) Union all RDDs of edges and then you can create Graph from new RDDs of vertexes and RDDs of edges.
Upvotes: 1