Reputation: 21
Below are the vertices and edges I constructed in apache graphX. All edges go in both directions. How do collect connected components so that the output is ("Alice","Bon","Charlie","David","Ed","Fran"),("Bosh","Bish","Posh")
val vertexArray = Array(
(1L, ("Alice", 28)),
(2L, ("Bob", 27)),
(3L, ("Charlie", 65)),
(4L, ("David", 42)),
(5L, ("Ed", 55)),
(6L, ("Fran", 51)),
(7L, ("Bosh", 520)),
(8L, ("Bish", 50)),
(9L, ("Posh", 50))
)
val edgeArray = Array(
Edge(2L, 1L, 1),
Edge(2L, 4L, 1),
Edge(3L, 2L, 1),
Edge(3L, 6L, 1),
Edge(4L, 1L, 1),
Edge(5L, 2L, 1),
Edge(5L, 3L, 1),
Edge(5L, 6L, 1),
Edge(7L, 8L, 1),
Edge(8L, 9L, 1),
Edge(1L, 2L, 1),
Edge(4L, 2L, 1),
Edge(2L, 3L, 1),
Edge(3L, 4L, 1),
Edge(1L, 5L, 1),
Edge(2L, 5L, 1),
Edge(3L, 5L, 1),
Edge(6L, 5L, 1),
Edge(8L, 7L, 1),
Edge(9L, 8L, 1))
Upvotes: 1
Views: 130
Reputation: 5325
You can join the resulting graph's vertices with your original graph's vertices, as explained in the documentation.
So first let us get a clean mapping id -> name
val users = graph.vertices.map{case (id, (name, age)) => (id, name)}
Then get the vertices of the connected components
val cc = graph.connectedComponents.vertices
As stated in the documentation,
The connected components algorithm labels each connected component of the graph with the ID of its lowest-numbered vertex
In your example there are two labels, either 1 for Alice,Bon,Charlie,David,Ed and Fran, or 7 for the three others. So we can now join the connected components with our original vertices and group by the label. After a bit of cleaning we get the desired result
val groups = users.join(cc)
.map{case (id, (name, label)) => (name, label)}
.groupBy(_._2)
.map{case (label, users) => users.map(_._1)}
Upvotes: 1