user2913139
user2913139

Reputation: 627

Pyspark convert to timestamp from custom format

In pyspark i have the following column:

| startedDateTime             |
| --------------------------- |
| 2023-01-13T00:00:57.126000Z |

And i wanted to convert it to seconds (epoch).

Tried .withColumn("startedDateTime_ts", F.unix_timestamp("startedDateTime", 'yyyy-MM-ddTHH:mm:ss.SSS000Z'))

But failing with error "Unknown pattern letter: T".

Why ? How can i pass the custom format to unix_timestamp() function ?

Tried to check the documentation: https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.functions.unix_timestamp.html but it looks format argument is not really well documented ?

Thanks,

Upvotes: 2

Views: 268

Answers (1)

wwnde
wwnde

Reputation: 26676

Convert to_timestamp and then unix_timestamp.

df1=spark.createDataFrame([('a' ,  '2022-05-03T14:00:00.000Z'  ,  '2022-05-03T14:18:00.000Z'),
('a'  , '2022-05-03T11:38:00.000Z' , '2022-05-03T12:18:00.000Z'),
('c' ,  '2022-05-03T13:15:00.000Z' ,  '2022-05-03T13:48:00.000Z'),
                          ('c' ,  '2022-05-03T13:15:00.000Z'  ,'2023-01-13T00:00:57.126000Z')],
('id' , 'start_ts' , 'end_ts' ))


df= df1.withColumn('ts', to_timestamp('start_ts')).select( 
      unix_timestamp(col("ts")).alias("timestamp_1"), 
      unix_timestamp(col("ts"),"MM-dd-yyyy HH:mm:ss").alias("timestamp_2"), 
      unix_timestamp(col("ts"),"MM-dd-yyyy").alias("timestamp_3"), 
      unix_timestamp('ts').alias("start_ts") 
).show()

Upvotes: 2

Related Questions