Convert string (with timestamp) to timestamp in pyspark

Question

I have a dataframe with a string datetime column. I am converting it to timestamp, but the values are changing. Following is my code, can anyone help me to convert without changing values.

df=spark.createDataFrame(
        data = [ ("1","2020-04-06 15:06:16 +00:00")],
        schema=["id","input_timestamp"])
df.printSchema()

#Timestamp String to DateType
df = df.withColumn("timestamp",to_timestamp("input_timestamp")) 
# Using Cast to convert TimestampType to DateType
df.withColumn('timestamp_string', \
         to_timestamp('timestamp').cast('string')) \
  .show(truncate=False)

This is the output:

+---+--------------------------+-------------------+-------------------+
|id |input_timestamp           |timestamp          |timestamp_string   |
+---+--------------------------+-------------------+-------------------+
|1  |2020-04-06 15:06:16 +00:00|2020-04-06 08:06:16|2020-04-06 08:06:16|
+---+--------------------------+-------------------+-------------------+

I want to know why the hour is changing from 15 to 8 and how can I prevent it?

notNull · Accepted Answer

I believe to_timestamp is converting timestamp value to your local time as you have +00:00 in your data.

Try to pass the format to to_timestamp() function.

Example:

from pyspark.sql.functions import to_timestamp

df.withColumn("timestamp",to_timestamp(col("input_timestamp"),"yyyy-MM-dd HH:mm:ss +00:00")).show(10,False)
#+---+--------------------------+-------------------+
#|id |input_timestamp           |timestamp          |
#+---+--------------------------+-------------------+
#|1  |2020-04-06 15:06:16 +00:00|2020-04-06 15:06:16|
#+---+--------------------------+-------------------+

Convert string (with timestamp) to timestamp in pyspark

Answers (2)

Related Questions