Alastair
Alastair

Reputation: 17

Converting a column to date format (DDMMMyyyy) in pyspark.I am getting whole column(date) as null

While changing the format of column week_end_date from string to date, I am getting whole column as null.

from pyspark.sql.functions import unix_timestamp, from_unixtime
df = spark.read.csv('dbfs:/location/abc.txt', 
header=True)

df2 = df.select(
'week_end_date', 
from_unixtime(unix_timestamp('week_end_date', 'MM-dd-yyyy')).alias('date')
).show()

print(df2)

enter image description here

Upvotes: 1

Views: 1860

Answers (1)

mck
mck

Reputation: 42422

Your date format is incorrect. It should be ddMMMyy. You can also directly use to_date instead of unix timestamp functions.

import pyspark.sql.functions as F

df = spark.read.csv('dbfs:/location/abc.txt', header=True)

df2 = df.select(
    'week_end_date', 
    F.to_date('week_end_date', 'ddMMMyy').alias('date')
)

If you want the format to be transformed to MM-dd-yyyy, you can use date_format:

df2 = df.select(
    'week_end_date', 
    F.date_format(F.to_date('week_end_date', 'ddMMMyy'), 'MM-dd-yyyy').alias('date')
)

Upvotes: 1

Related Questions