Reputation: 5660
I have a script where I'm pulling data into a pyspark DataFrame using spark sql. The script is shown below:
from pyspark import SparkContext, SparkConf, HiveContext
from pyspark.sql import SparkSession
spark = SparkSession.builder.enableHiveSupport().getOrCreate()
df_query = """
select
*
from schema.table
where start_date between date '2019-03-01' and date '2019-03-07'
"""
df = spark.sql(df_query)
Currently, the script pulls data for a particular week. However, I want to iterate this script over all weeks. How can I do that?
Upvotes: 0
Views: 2565
Reputation: 19365
You can use the timedelta class for that:
import datetime
startDate = datetime.datetime.strptime('2019-03-01', "%Y-%m-%d")
maxDate = datetime.datetime.strptime('2019-04-03', "%Y-%m-%d")
while startDate <= maxDate:
endDate = startDate + datetime.timedelta(days=7)
df_query = """
select
*
from schema.table
where start_date between date '{}' and date '{}'
""".format(startDate.date(), endDate.date())
print(df_query)
startDate = endDate + datetime.timedelta(days=1)
Upvotes: 2