Reputation: 9337
I want to generate a DataFrame with dates using PySpark's sequence()
function (not looking for work-arounds using other methods). I got this working with the default step of 1. But how do I generate a sequence with dates, say, 1 week apart? I can't figure out what type/value to feed into the step
parameter of the function.
df = (spark.createDataFrame([{'date':1}])
.select(explode(sequence(to_date(lit('2021-01-01')),to_date(lit(date.today())))).alias('calendar_date')))
df.show()
Upvotes: 2
Views: 4763
Reputation: 824
You have to use an INTERVAL literal. From your code:
df = (
spark
.createDataFrame([{'date':1}])
.select(
explode(sequence(
to_date(lit('2021-01-01')), # start
to_date(lit(date.today())), # stop
expr("INTERVAL 1 WEEK") # step
)).alias('calendar_date')
)
)
df.show()
https://spark.apache.org/docs/latest/sql-ref-literals.html#interval-literal
Upvotes: 3