Window function sum, multiplied by condition

Question

I am reviewing a code and would love to have a bit more clarity.

Here is my PySpark Dataframe:

YEAR_A	YEAR_B	AMOUNT
2000	2001	5
2000	2000	4
2000	2001	3

I initiate a window function:

window = Window.partitionBy('YEAR_A')

Then I would love some help to understand the following part, especially after the over(window).

df = (df.withColumn("newcolumn", F.sum("AMOUNT").over(window) *(F.col("YEAR_B") == F.col("YEAR_A")).cast("integer")))

Is it supposed to create a "newcolumn" to my dataframe with the sum of "AMOUNT" of the current YEAR_A and write it only if "YEAR_A" is equal to "YEAR_B" (otherwise write nan)? or am I missing something?

Window function sum, multiplied by condition

Answers (1)

Related Questions