vkt
vkt

Reputation: 1459

Intersection in a dataframe

I have a data frame with the following format.

ID | Value
1  | AAA
2  | XXX
3  | BBB
1  | XXX
2  | CCC
3  | DDD
1  | YYY
2  | DDD
3  | XXX

How can I find the intersection within IDs?

1 -> AAA,XXX,YYY
2 -> XXX,CCC,DDD
3 -> BBB,DDD,XXX

Expected result: XXX

Thank you very much in advance!

Upvotes: 0

Views: 103

Answers (1)

ollik1
ollik1

Reputation: 4540

Grouping by Value and checking which groups have all values present

val cnt = df.select($"ID").distinct().count()
df.groupBy($"Value")
  .agg(countDistinct("ID") as "cnt")
  .filter($"cnt" === cnt)
  .select($"Value")
  .show()

Output:

+-----+
|Value|
+-----+
|  XXX|
+-----+

Upvotes: 3

Related Questions