Reputation: 958
I am trying to figure out a way to use monthly range with window partition on dates column compared to today's date.
So suppose today it is 21 Jan 2021 so I need to fetch the month and year as 202101 and then put range of months based on that deduced value.
Like 2 months old rows, 202101 = 202011
Then on the obtained result I need to use aggregation on another column.
So if input is:
Id,date, price
1,1-10-2021,5
1,1-11-2021,6
1,1-09-2021,10
And current date is 20 Jan 2021
So if I want aggregation of last two months (dec and nov) then output should be 5
If I want aggregation of last three months (dec, nov, oct) output should be 11
Upvotes: 0
Views: 658
Reputation: 32700
You don't need Window in this case. Simply group by id
and use conditional sum aggregation:
from pyspark.sql import functions as F
df = spark.createDataFrame([
(1, "2021-10-01", 5), (1, "2021-11-01", 6),
(1, "2021-09-01", 10), (2, "2021-12-01", 9)
], ["Id", "date", "price"])
nb_last_months = 2
df1 = df.groupBy("id").agg(
F.sum(
F.when(
F.col("date") >= F.add_months(F.date_trunc("month", F.current_date()), - nb_last_months),
F.col("price")
)
).alias(f"sum_last_{nb_last_months}_months")
)
df1.show()
#+---+-----------------+
#| id|sum_last_2_months|
#+---+-----------------+
#| 1| 6|
#| 2| 9|
#+---+-----------------+
Upvotes: 1