sparc
sparc

Reputation: 429

How to get 1st day of the year in pyspark

I have a date variable that I need to pass to various functions.

For e.g, if I have the date in a variable as 12/09/2021, it should return me 01/01/2021

How do I get 1st day of the year in PySpark

Upvotes: 2

Views: 3112

Answers (3)

Cleared
Cleared

Reputation: 2590

You can use the trunc-function which truncates parts of a date.

df = spark.createDataFrame([()], [])
(
    df
    .withColumn('current_date', f.current_date())
    .withColumn("year_start", f.trunc("current_date", "year"))
    .show()
)

# Output
+------------+----------+
|current_date|year_start|
+------------+----------+
|  2022-02-23|2022-01-01|
+------------+----------+

Upvotes: 2

Vaebhav
Vaebhav

Reputation: 5032

You can achieve this with date_trunc with to_date as the later returns a Timestamp rather than a Date

Data Preparation

df = pd.DataFrame({
        'Date':['2021-01-23','2002-02-09','2009-09-19'],
})

sparkDF = sql.createDataFrame(df)

sparkDF.show()

+----------+
|      Date|
+----------+
|2021-01-23|
|2002-02-09|
|2009-09-19|
+----------+

Date Trunc & To Date

sparkDF = sparkDF.withColumn('first_day_year_dt',F.to_date(F.date_trunc('year',F.col('Date')),'yyyy-MM-dd'))\
                 .withColumn('first_day_year_timestamp',F.date_trunc('year',F.col('Date')))

sparkDF.show()

+----------+-----------------+------------------------+
|      Date|first_day_year_dt|first_day_year_timestamp|
+----------+-----------------+------------------------+
|2021-01-23|       2021-01-01|     2021-01-01 00:00:00|
|2002-02-09|       2002-01-01|     2002-01-01 00:00:00|
|2009-09-19|       2009-01-01|     2009-01-01 00:00:00|
+----------+-----------------+------------------------+

Upvotes: 0

Luiz Viola
Luiz Viola

Reputation: 2436

x = '12/09/2021'

'01/01/' + x[-4:]
output: '01/01/2021'

Upvotes: 0

Related Questions