Reputation: 159
I have SPARK data frame with the following columns:
I want to first groupBy("ID")
then orderBy("Date")
then concatenate
name.
So this dataframe:
ID Date Name
1 01-02-2019 x
1 04-02-2019 z
2 05-03-2019 b
1 03-02-2019 y
2 02-03-2019 a
should convert into this:
ID Name_concat
1 x,y,z
2 a,b
Please provide the spark scala syntax to accomplish the above.
This code is able to concatenate the string per id, but it's not maintaining the order.
df.orderBy("id","date").groupBy("id").agg(concat_ws(", ", collect_list($"name")).as("all_name"))
Upvotes: 0
Views: 1043
Reputation: 6228
df.show
+---+----------+---+
| id| Date| v|
+---+----------+---+
| 1|2019-02-01| x|
| 1|2019-02-04| z|
| 2|2019-05-03| a|
| 1|2019-02-03| y|
| 2|2019-05-02| b|
| 2|2019-05-06| c|
+---+----------+---+
val window = Window.partitionBy(col("id")).orderBy(col("Date"))
df.withColumn("test",collect_list("v").over(window)).groupBy("id").agg(last("test")).show
+---+-----------------+
| id|last(test, false)|
+---+-----------------+
| 1| [x, y, z]|
| 2| [b, a, c]|
+---+-----------------+
Upvotes: 1