Reputation: 6139
I am trying to get a date range from current date to previous 3 years And previous 3 yr data should start with Jan 01. Below is the code snippet that I have tried.
dateDF = spark.sql("select current_date() as current_date, add_months(current_date(),-36) as end_date")
dateDF = dateDF.withColumn("end_date_first_date", F.trunc("end_date", "month")).withColumn("end_date_first_date_first_month",lit(''))
dateDF.show()
+------------+----------+-------------------+-------------------------------+
|current_date| end_date|end_date_first_date|end_date_first_date_first_month|
+------------+----------+-------------------+-------------------------------+
| 2021-04-09|2018-04-09| 2018-04-01| |
+------------+----------+-------------------+-------------------------------+
Here I was able to get first date, but how can I get first month. Is there any pre defined functions?
Expected output
+------------+----------+-------------------+-------------------------------+
|current_date| end_date|end_date_first_date|end_date_first_date_first_month|
+------------+----------+-------------------+-------------------------------+
| 2021-04-09|2018-04-09| 2018-04-01| 2018-01-01 |
+------------+----------+-------------------+-------------------------------+
Upvotes: 0
Views: 128
Reputation: 42352
Just use year
instead of month
in F.trunc
:
dateDF = dateDF.withColumn(
"end_date_first_date",
F.trunc("end_date", "month")
).withColumn(
"end_date_first_date_first_month",
F.trunc("end_date", "year")
)
Upvotes: 2