sparc
sparc

Reputation: 429

Convert yyyyMM to end of month date using PySpark

I have a column in a dataframe in Pyspark with date in integer format e.g 202203 (yyyyMM format). I want to convert that to end of month date as 2022-03-31. How do I achieve this?

Upvotes: 0

Views: 1010

Answers (1)

vladsiv
vladsiv

Reputation: 2936

First cast column to String, then use to_date to get the date and then last_day.

Example:

from pyspark.sql import SparkSession
from pyspark.sql import functions as F


spark = SparkSession.builder.getOrCreate()
data = [{"x": 202203}]
df = spark.createDataFrame(data=data)
df = df.withColumn("date", F.last_day(F.to_date(F.col("x").cast("string"), "yyyyMM")))
df.show(10)
df.printSchema()

Output:

+------+----------+                                                             
|     x|      date|
+------+----------+
|202203|2022-03-31|
+------+----------+

root
 |-- x: long (nullable = true)
 |-- date: date (nullable = true)

Upvotes: 1

Related Questions