xzk
xzk

Reputation: 877

scala/spark - group dataframe and select values from other column as dataframe

I'm trying to create a new DataFrame from an existing DataFrame in Scala/Spark.

Below is my existing DataFrame:

+----------+-----------------------+
|     group|                  value|
+----------+-----------------------+
|         4|[blah blah blah blah...|
|         0|[blah blah blah blah...|
|         1|[blah blah blah blah...|
|         1|[blah blah blah blah...|
|         0|[blah blah blah blah...|
|         2|[blah blah blah blah...|
|         0|[blah blah blah blah...|
|         2|[blah blah blah blah...|
//and so on
+----------+--------------------+

Now I want to group above DataFrame by group column, then for each group, aggregate value into a list of original values, to produce something like below:

+----------+---------------------------------+
|     group|                            value|
+----------+---------------------------------+
|         0|[[blah blah...],[blah blah...]...|
|         1|[[blah blah...],[blah blah...]...|
|         2|[[blah blah...],[blah blah...]...|
|         3|[[blah blah...],[blah blah...]...|
|         4|[[blah blah...],[blah blah...]...|
+----------+---------------------------------+

How can I achieve it?

Upvotes: 1

Views: 596

Answers (1)

s.polam
s.polam

Reputation: 10382

Try below code.

scala> df.show(false)
+-----+---------------------+
|group|value                |
+-----+---------------------+
|4    |[blah blah blah blah]|
|0    |[blah blah blah blah]|
|1    |[blah blah blah blah]|
|1    |[blah blah blah blah]|
|0    |[blah blah blah blah]|
|2    |[blah blah blah blah]|
|0    |[blah blah blah blah]|
|2    |[blah blah blah blah]|
+-----+---------------------+
scala> 
df
.groupBy($"group")
.agg(collect_list($"value").as("value"))
.orderBy($"group".asc)
.show(false)

+-----+---------------------------------------------------------------------+
|group|value                                                                |
+-----+---------------------------------------------------------------------+
|0    |[[blah blah blah blah], [blah blah blah blah], [blah blah blah blah]]|
|1    |[[blah blah blah blah], [blah blah blah blah]]                       |
|2    |[[blah blah blah blah], [blah blah blah blah]]                       |
|4    |[[blah blah blah blah]]                                              |
+-----+---------------------------------------------------------------------+

Upvotes: 2

Related Questions