Reputation: 207
To resume i want to put into the same group values that are associated:
Here is what i have :
col1 col2
1 2
1 3
2 3
4 5
5 6
and I want this :
col1 col2 group
1 2 1
1 3 1
2 3 1
4 5 2
5 6 2
To produce those two groups here are the steps if i do it manually.
Do you have an idea of to resolve this in SQL. Knowing that i am using Hive or pyspark
Upvotes: 0
Views: 78
Reputation: 207
Based on A.R.Ferguson answer i was able to figure out the solution using pyspark and graphframe:
from graphframes import *
vertices = sqlContext.createDataFrame([
("A", 1),
("B", 2),
("C", 3),
("D", 4),
("E", 5),
("F", 6)], ["name", "id"])
edges = sqlContext.createDataFrame([
(1, 2),
(1, 3),
(2, 3),
(4, 5),
(5, 6)], ["src", "dst"])
g = GraphFrame(vertices, edges)
result = g.connectedComponents()
result.show()
Thanks again Ferguson.
Upvotes: 1