Dusty
Dusty

Reputation: 181

Adding with Window Functions, from specific value

I am struggling with a (Py)Spark problem.

I have a column "col" in an ordered dataframe and need a way of adding up the elements from 0. What I need is the column "sum_from_0". I tried it with window functions but did not succeed. Any idea on how to solve this task would be appreciated. Thank you in advance.

col sum_from_0
0   None
0   None
1   1
2   3
1   4
4   8
3   11
0   None
0   None
0   None
1   1
2   3
3   6
3   9
2   11
0   None
0   None

Upvotes: 0

Views: 353

Answers (1)

Lamanus
Lamanus

Reputation: 13591

There is no ordering column, so I made it first and add some temp columns to separate sum groups. After that, sum over the group partition and order by id window such as

import org.apache.spark.sql.expressions.Window

val w1 = Window.orderBy("id")
val w2 = Window.partitionBy("group").orderBy("id")

df.withColumn("id", monotonically_increasing_id)
  .withColumn("zero", (col("col") === 0).cast("int"))
  .withColumn("group", sum("zero").over(w1))
  .withColumn("sum_from_0", sum("col").over(w2))
  .orderBy("id")
  .drop("id", "group", "zero")
  .show(20, false)

that gives the results:

+---+----------+
|col|sum_from_0|
+---+----------+
|0  |0         |
|0  |0         |
|1  |1         |
|2  |3         |
|1  |4         |
|4  |8         |
|3  |11        |
|0  |0         |
|0  |0         |
|0  |0         |
|1  |1         |
|2  |3         |
|3  |6         |
|3  |9         |
|2  |11        |
|0  |0         |
|0  |0         |
+---+----------+

Upvotes: 3

Related Questions