Reputation: 429
I have a column in a dataframe in Pyspark with date in integer format e.g 202203 (yyyyMM format). I want to convert that to end of month date as 2022-03-31. How do I achieve this?
Upvotes: 0
Views: 1010
Reputation: 2936
First cast column to String, then use to_date
to get the date and then last_day
.
Example:
from pyspark.sql import SparkSession
from pyspark.sql import functions as F
spark = SparkSession.builder.getOrCreate()
data = [{"x": 202203}]
df = spark.createDataFrame(data=data)
df = df.withColumn("date", F.last_day(F.to_date(F.col("x").cast("string"), "yyyyMM")))
df.show(10)
df.printSchema()
Output:
+------+----------+
| x| date|
+------+----------+
|202203|2022-03-31|
+------+----------+
root
|-- x: long (nullable = true)
|-- date: date (nullable = true)
Upvotes: 1