EXODIA
EXODIA

Reputation: 958

Custom month range with current date in window function

I am trying to figure out a way to use monthly range with window partition on dates column compared to today's date.

So suppose today it is 21 Jan 2021 so I need to fetch the month and year as 202101 and then put range of months based on that deduced value.

Like 2 months old rows, 202101 = 202011

Then on the obtained result I need to use aggregation on another column.

So if input is:

Id,date, price
1,1-10-2021,5
1,1-11-2021,6
1,1-09-2021,10

And current date is 20 Jan 2021

So if I want aggregation of last two months (dec and nov) then output should be 5

If I want aggregation of last three months (dec, nov, oct) output should be 11

Upvotes: 0

Views: 658

Answers (1)

blackbishop
blackbishop

Reputation: 32700

You don't need Window in this case. Simply group by id and use conditional sum aggregation:

from pyspark.sql import functions as F

df = spark.createDataFrame([
    (1, "2021-10-01", 5), (1, "2021-11-01", 6),
    (1, "2021-09-01", 10), (2, "2021-12-01", 9)
], ["Id", "date", "price"])

nb_last_months = 2

df1 = df.groupBy("id").agg(
    F.sum(
        F.when(
            F.col("date") >= F.add_months(F.date_trunc("month", F.current_date()), - nb_last_months),
            F.col("price")
        )
    ).alias(f"sum_last_{nb_last_months}_months")
)

df1.show()
#+---+-----------------+
#| id|sum_last_2_months|
#+---+-----------------+
#|  1|                6|
#|  2|                9|
#+---+-----------------+

Upvotes: 1

Related Questions