Atul
Atul

Reputation: 185

PySpark(version 3.0.0) to_timestamp returns null when I convert event_timestamp column from string to timestamp

Input file format: https://i.sstatic.net/aNDmZ.png

After conversion: https://i.sstatic.net/nobwD.png

I tried other solutions from stackoverflow but I am using spark 3.0.0 and it's not working.

Upvotes: 0

Views: 5639

Answers (2)

Pulkit Chhipa
Pulkit Chhipa

Reputation: 1

def to_timestamp(s: Column, fmt: String): Column Converts time string with the given pattern to timestamp.

See Datetime Patterns for valid date and time format patterns

s A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

fmt A date time pattern detailing the format of s when s is a string

returns A timestamp, or null if s was a string that could not be cast to a timestamp or fmt was an invalid format

Since 2.2.0

Upvotes: 0

notNull
notNull

Reputation: 31540

In to_timestamp you need to match AM/PM using a and hh instead of HH.

Example:

sc.version
#'3.0.0-preview2'
df.show()
#+-------------------+
#|    event_timestamp|
#+-------------------+
#|10/14/2016 09:28 PM|
#|10/23/2016 02:41 AM|
#+-------------------+

from pyspark.sql.functions import *

#using to_timestamp function
df.withColumn("new_ts",to_timestamp(col("event_timestamp"),"MM/dd/yyyy hh:mm a")).show()

#using from_unixtime and unix_timestmap functions
df.withColumn("new_ts",from_unixtime(unix_timestamp(col("event_timestamp"),"MM/dd/yyyy hh:mm a"),'yyyy-MM-dd HH:mm:ss').cast("timestamp")).show()

#+-------------------+-------------------+
#|    event_timestamp|             new_ts|
#+-------------------+-------------------+
#|10/14/2016 09:28 PM|2016-10-14 21:28:00|
#|10/23/2016 02:41 AM|2016-10-23 02:41:00|
#+-------------------+-------------------+

Upvotes: 3

Related Questions