dMb
dMb

Reputation: 9337

PySpark: date interval in PySpark's sequence function?

I want to generate a DataFrame with dates using PySpark's sequence() function (not looking for work-arounds using other methods). I got this working with the default step of 1. But how do I generate a sequence with dates, say, 1 week apart? I can't figure out what type/value to feed into the step parameter of the function.

df =  (spark.createDataFrame([{'date':1}])
      .select(explode(sequence(to_date(lit('2021-01-01')),to_date(lit(date.today())))).alias('calendar_date')))
df.show()

Upvotes: 2

Views: 4763

Answers (1)

Gaarv
Gaarv

Reputation: 824

You have to use an INTERVAL literal. From your code:

df = (
    spark
    .createDataFrame([{'date':1}])
    .select(
        explode(sequence(
            to_date(lit('2021-01-01')), # start
            to_date(lit(date.today())), # stop
            expr("INTERVAL 1 WEEK")     # step
        )).alias('calendar_date')
    )
)

df.show()

https://spark.apache.org/docs/latest/sql-ref-literals.html#interval-literal

Upvotes: 3

Related Questions