Reputation:
i have a dataframe in spark
+------+----------+
|sno | ssn |
+------+----------+
| 123|200000000|
| 789|200000002|
| 123|200000000|
| 123|200000001|
| 894|200000001|
+------+----------+
i wanted to group by sno and when i group by serial number the resulting dataframe should be
+------+----------+---------
|sno | ssn |
+------+----------+---------
| 123|200000000,200000001|
| 789|200000002 |
| 894|200000001 |
+------+----------+--------|
I am new to spark and how would i do this
when i register the table as temp table and do a sql group by i couldn't get the results in above format , how do i get the results?
Upvotes: 1
Views: 48
Reputation: 1076
You can use collect_set after grouping by sno. Below is the code for the same.
//Creating Test Data
val df = Seq((123, 200000000), (789, 200000002), (123, 200000000), (123, 200000001), (894, 200000001))
.toDF("sno", "ssn")
val df1 = df.groupBy("sno")
.agg(collect_set("ssn").as("ssn"))
df1.show(false)
+---+----------------------+
|sno|ssn |
+---+----------------------+
|123|[200000000, 200000001]|
|789|[200000002] |
|894|[200000001] |
+---+----------------------+
Upvotes: 2