Santhosh Chakka
Santhosh Chakka

Reputation: 335

How to load date with custom format in Spark

I have a scenario where I have a column data like "Tuesday, 09-Aug-11 21:13:26 GMT" and I want to create a schema in Spark but the datatypes TimestampType and DateType is not able to recognize this date format.

After loading the data to a dataframe using TimestampType or DateType I am seeing NULL values in that particular column.

Is there any alternative for this?

Upvotes: 0

Views: 253

Answers (1)

s.polam
s.polam

Reputation: 10372

One option is to read "Tuesday, 09-Aug-11 21:13:26 GMT" as string type column & do transformation from string to timestamp something like below.

df.show(truncate=false)
+-------------------------------+
|dt                             |
+-------------------------------+
|Tuesday, 09-Aug-11 21:13:26 GMT|
+-------------------------------+

df.withColumn("dt",to_timestamp(col("dt"),"E, d-MMM-y H:m:s z")).show(truncate=false) //Note -  It is converted GMT to IST local timezone.

+-------------------+
|dt                 |
+-------------------+
|2011-08-10 02:43:26|
+-------------------+

Upvotes: 3

Related Questions