Calculate Spark column value depending on another row value on the same column

Question

I'm working on Apache spark 2.3.0 cloudera4 and I have an issue processing a Dataframe.

I've got this input dataframe:

+---+---+----+
| id| d1| d2 |
+---+---+----+
|  1|   | 2.0|
|  2|   |-4.0|
|  3|   | 6.0|
|  4|3.0|    |
+---+---+----+

And I need this output:

+---+---+----+----+
| id| d1| d2 |  r |
+---+---+----+----+
|  1|   | 2.0| 7.0|
|  2|   |-4.0| 5.0|
|  3|   | 6.0| 9.0|
|  4|3.0|    | 3.0|
+---+---+.---+----+

Which is, from an iterating perspective, get the biggest id row (4) and put the d1 value on the r column, then take the next row (3) and put r[4] + d2[3] on r column, and so on.

Is it posible to do something like that on Spark? because I will need a computed value from a row to calculate the value for another row.

Calculate Spark column value depending on another row value on the same column

Answers (1)

Related Questions