Reputation: 2611
I have some dataframe like below :
+--------------------+-------------+----------+--------------------+------------------+
| cond | val | val1 | val2 | val3 |
+--------------------+-------------+----------+--------------------+------------------+
|cond1 | 1 | null | null | null |
|cond1 | null | 2 | null | null |
|cond1 | null | null | 3 | null |
|cond1 | null | null | null | 4 |
|cond2 | null | null | null | 44 |
|cond2 | null | 22 | null | null |
|cond2 | null | null | 33 | null |
|cond2 | 11 | null | null | null |
|cond3 | null | null | null | 444 |
|cond3 | 111 | 222 | null | null |
|cond3 | 1111 | null | null | null |
|cond3 | null | null | 333 | null |
I want to reduce the numbers based value of the some column, I want the resultant column to look like below :
+--------------------+-------------+----------+--------------------+------------------+
| cond | val | val1 | val2 | val3 |
+--------------------+-------------+----------+--------------------+------------------+
|cond1 | 1 | 2 | 3 | 4 |
|cond2 | 11 | 22 | 33 | 44 |
|cond3 | 111,1111 | 222 | 333 | 444 |
Upvotes: 0
Views: 1259
Reputation: 18118
Try using .groupBy() and .agg() e.g.
val output = input.groupBy("cond")
.agg(collect_list("val").name("val"))
.agg(collect_list("val1").name("val1"))
.agg(collect_list("val2").name("val2"))
.agg(collect_list("val3").name("val3"))
Upvotes: 1