Reputation: 695
For example, if I have two graphs with vertices and edges like this:
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD
val vertexRdd1: RDD[(VertexId, (String, Int))] = sc.parallelize(Array(
(1L, ("a", 28)),
(2L, ("b", 27)),
(3L, ("c", 65))
))
val edgeRdd1: RDD[Edge[Int]] = sc.parallelize(Array(
Edge(1L, 2L, 1),
Edge(2L, 3L, 8)
))
val vertexRdd2: RDD[(VertexId, (String, Int))] = sc.parallelize(Array(
(1L, ("a", 28)),
(2L, ("b", 27)),
(3L, ("c", 28)),
(4L, ("d", 27)),
(5L, ("e", 65))
))
val edgeRdd2: RDD[Edge[Int]] = sc.parallelize(Array(
Edge(1L, 2L, 1),
Edge(2L, 3L, 4),
Edge(3L, 5L, 1),
Edge(2L, 4L, 1)
))
How can I get the number of common edges between these two graphs, without considering the edge attribute? So, in the above example the number of common edges is 2 and the common edges are: Edge(1L, 2L, 1) common with Edge(1L, 2L, 1) and Edge(2L, 3L, 8) common with Edge(2L, 3L, 4).
I am programming in scala.
Upvotes: 1
Views: 1224
Reputation: 330163
Assuming you have graph1
(Graph(vertexRdd1, edgeRdd1)
) and graph2
(Graph(vertexRdd2, edgeRdd2))
) you can map edges to (srcId, dstId)
and then use intersection
method:
val srcDst1 = graph1.edges.map(e => (e.srcId, e.dstId))
val srcDst2 = graph2.edges.map(e => (e.srcId, e.dstId))
srcDst1.intersection(srcDst2).count()
Upvotes: 1