Reputation: 96
My Input for spark is below:
Col_1 | Col_2 | Amount |
---|---|---|
1 | 0 | 35/310320 |
1 | 1 | 35/5 |
1 | 1 | 180/-310350 |
17 | 1 | 0/1000 |
17 | 17 | 0/-1000 |
17 | 17 | 74/314322 |
17 | 17 | 74/5 |
17 | 17 | 185/-3142 |
I want to generate the below Output using spark SQL:
Output |
---|
35/310320 |
35/310320/35/5 |
35/310320/35/5/180/-310350 |
0/1000 |
0/1000/0/-1000 |
0/1000/0/-1000/74/314322 |
0/1000/0/-1000/74/314322/74/5 |
0/1000/0/-1000/74/314322/74/5/185/-3142 |
Conditions & Procedure: If col_1
and col_2
values are not the same then consider the current amount value for the new Output column but both are the same then concatenate the previous all amount value by /
.
i.e. 17 from col_1
where col_1
& col_2
value are different so consider current amount 0/1000
. Next step both column values is the same so the value is 0/1000/0/-1000
and so on. Need to create this logic for dynamic data in spark SQL or Spark Scala.
Upvotes: 0
Views: 224
Reputation: 42422
You can use concat_ws
on a list of amount obtained from collect_list
over an appropriate window:
import org.apache.spark.sql.expressions.Window
val df2 = df.withColumn(
"output",
concat_ws(
"/",
collect_list("amount").over(
Window.partitionBy("col_1")
.orderBy("col_2")
.rowsBetween(Window.unboundedPreceding, 0)
)
)
)
Upvotes: 1