stack0114106
stack0114106

Reputation: 8721

spark date format MMM dd, yyyy hh:mm:ss AM to timestamp in df

I need to convert a descriptive date format from a log file "MMM dd, yyyy hh:mm:ss AM/PM" to the spark timestamp datatype. I tried something like below, but it is giving null.

val df = Seq(("Nov 05, 2018 02:46:47 AM"),("Nov 5, 2018 02:46:47 PM")).toDF("times")
df.withColumn("time2",date_format('times,"MMM dd, yyyy HH:mm:ss AM")).show(false)

+------------------------+-----+
|times                   |time2|
+------------------------+-----+
|Nov 05, 2018 02:46:47 AM|null |
|Nov 5, 2018 02:46:47 PM |null |
+------------------------+-----+

Expected output

+------------------------+----------------------------+
|times                   |time2                       |
+------------------------+-----+----------------------+
|Nov 05, 2018 02:46:47 AM|2018-11-05 02:46:47.000000" |
|Nov 5, 2018 02:46:47 PM |2018-11-05 14:46:47.000000" |
+------------------------+-----+----------------------+

What is the proper format for converting this?. Note that DD may be having leading zeroes.

Upvotes: 2

Views: 17741

Answers (4)

Arif Rizwan
Arif Rizwan

Reputation: 41

We can use splitby

Select date.split('-')[2]||'-'||case when length(date.split('-')[0]) = 1 then '0'||date.split('-')[0] else  date.split('-')[0] end  || case when length(date.split('-')[1]) = 1 then '0'||date.split('-')[1] else  date.split('-')[1] end] 
Date = date_column

Date format yyyy-mm-dd
Delimiter can be different.

Without any date format

Upvotes: -1

m-bhole
m-bhole

Reputation: 1189

Here is your answer

val df = Seq(("Nov 05, 2018 02:46:47 AM"),("Nov 5, 2018 02:46:47 PM")).toDF("times")

scala> df.withColumn("times2", from_unixtime(unix_timestamp(col("times"), "MMM d, yyyy hh:mm:ss a"),"yyyy-MM-dd HH:mm:ss.SSSSSS")).show(false)
    +------------------------+--------------------------+
    |times                   |times2                    |
    +------------------------+--------------------------+
    |Nov 05, 2018 02:46:47 AM|2018-11-05 02:46:47.000000|
    |Nov 5, 2018 02:46:47 PM |2018-11-05 14:46:47.000000|
    +------------------------+--------------------------+

Please use hh for hour instead of HH if you want to parse 12 hour format. Also am/pm is indicated by suffix "a" while parsing.

Hope this helps!!

Upvotes: 10

Eric Bellet
Eric Bellet

Reputation: 2045

Using SQL syntax:

select date_format(to_timestamp(ColumnTimestamp, "MM/dd/yyyy hh:mm:ss aa"), "yyyy-MM-dd") as ColumnDate 
from database_name.table_name

Upvotes: 0

stack0114106
stack0114106

Reputation: 8721

Using to_timestamp and date_format functions

scala> df.withColumn("times2",to_timestamp('times,"MMM d, yyyy hh:mm:ss a")).show(false)
+------------------------+-------------------+
|times                   |times2             |
+------------------------+-------------------+
|Nov 05, 2018 02:46:47 AM|2018-11-05 02:46:47|
|Nov 5, 2018 02:46:47 PM |2018-11-05 14:46:47|
+------------------------+-------------------+


scala> df.withColumn("times2",date_format(to_timestamp('times,"MMM d, yyyy hh:mm:ss a"),"yyyy-MM-dd HH:mm:ss.SSSSSS")).show(false)
+------------------------+--------------------------+
|times                   |times2                    |
+------------------------+--------------------------+
|Nov 05, 2018 02:46:47 AM|2018-11-05 02:46:47.000000|
|Nov 5, 2018 02:46:47 PM |2018-11-05 14:46:47.000000|
+------------------------+--------------------------+


scala>

Upvotes: 3

Related Questions