nam
nam

Reputation: 23749

pyspark - date column with null values not filling with old date

In the following Hiring_date is of DateType. df2 fills the null dates as '1900-01-01'. But the date format of actual data is mm/dd/yyyy. Hence I want the null values to be filled as 01/01/1900. So, I tried the code shown in second block below; but the Hiring_date column still showed null values as NULL.

Question: What I may be missing and how can we fix it? I guess more important part of the question would be: Why Code 2 is ignoring the 01/01/1900 allotter

Code 1: Fills the null date values as '1900-01-01'. But I need the 01/01/1900 format

df1 = df..withColumn("Hiring_date", df.Hiring_date.cast(DateType()))
df2 = df1.fillna( {'Hiring_date': '1900-01-01'} )

Code 2: Fills the null date values as NULL. But I need it to display 01/01/1900

df1 = df..withColumn("Hiring_date", df.Hiring_date.cast(DateType()))
df2 = df1.fillna( {'Hiring_date': '01/01/1900'} )

Upvotes: 0

Views: 1243

Answers (1)

Chris
Chris

Reputation: 16147

I believe the only valid string format for inputting DateType is yyyy-MM-dd which explains why your first code is working.

You seem to be wanting a string representation of the date, which you can achieve with:

df.withColumn('Hiring_date', date_format(col('Hiring_date'), 'MM/dd/yyyy'))

Upvotes: 1

Related Questions