Vikrant
Vikrant

Reputation: 159

In a SPARK dataframe, I want to groupBy, then orderBY and them concatenate rows of another column

I have SPARK data frame with the following columns:

I want to first groupBy("ID") then orderBy("Date") then concatenate name.

So this dataframe:

ID  Date          Name
1   01-02-2019    x
1   04-02-2019    z
2   05-03-2019    b
1   03-02-2019    y
2   02-03-2019    a

should convert into this:

ID  Name_concat
1   x,y,z
2   a,b

Please provide the spark scala syntax to accomplish the above.

This code is able to concatenate the string per id, but it's not maintaining the order.

df.orderBy("id","date").groupBy("id").agg(concat_ws(", ", collect_list($"name")).as("all_name"))

Upvotes: 0

Views: 1043

Answers (1)

undefined_variable
undefined_variable

Reputation: 6228

df.show
+---+----------+---+
| id|      Date|  v|
+---+----------+---+
|  1|2019-02-01|  x|
|  1|2019-02-04|  z|
|  2|2019-05-03|  a|
|  1|2019-02-03|  y|
|  2|2019-05-02|  b|
|  2|2019-05-06|  c|
+---+----------+---+


val window = Window.partitionBy(col("id")).orderBy(col("Date"))

df.withColumn("test",collect_list("v").over(window)).groupBy("id").agg(last("test")).show

+---+-----------------+
| id|last(test, false)|
+---+-----------------+
|  1|        [x, y, z]|
|  2|        [b, a, c]|
+---+-----------------+

Upvotes: 1

Related Questions