Reputation: 1459
I have a data frame with the following format.
ID | Value
1 | AAA
2 | XXX
3 | BBB
1 | XXX
2 | CCC
3 | DDD
1 | YYY
2 | DDD
3 | XXX
How can I find the intersection within IDs?
1 -> AAA,XXX,YYY
2 -> XXX,CCC,DDD
3 -> BBB,DDD,XXX
Expected result: XXX
Thank you very much in advance!
Upvotes: 0
Views: 103
Reputation: 4540
Grouping by Value
and checking which groups have all values present
val cnt = df.select($"ID").distinct().count()
df.groupBy($"Value")
.agg(countDistinct("ID") as "cnt")
.filter($"cnt" === cnt)
.select($"Value")
.show()
Output:
+-----+
|Value|
+-----+
| XXX|
+-----+
Upvotes: 3