Reputation: 627
In pyspark i have the following column:
| startedDateTime |
| --------------------------- |
| 2023-01-13T00:00:57.126000Z |
And i wanted to convert it to seconds (epoch).
Tried .withColumn("startedDateTime_ts", F.unix_timestamp("startedDateTime", 'yyyy-MM-ddTHH:mm:ss.SSS000Z'))
But failing with error "Unknown pattern letter: T".
Why ? How can i pass the custom format to unix_timestamp()
function ?
Tried to check the documentation: https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.functions.unix_timestamp.html but it looks format argument is not really well documented ?
Thanks,
Upvotes: 2
Views: 268
Reputation: 26676
Convert to_timestamp and then unix_timestamp.
df1=spark.createDataFrame([('a' , '2022-05-03T14:00:00.000Z' , '2022-05-03T14:18:00.000Z'),
('a' , '2022-05-03T11:38:00.000Z' , '2022-05-03T12:18:00.000Z'),
('c' , '2022-05-03T13:15:00.000Z' , '2022-05-03T13:48:00.000Z'),
('c' , '2022-05-03T13:15:00.000Z' ,'2023-01-13T00:00:57.126000Z')],
('id' , 'start_ts' , 'end_ts' ))
df= df1.withColumn('ts', to_timestamp('start_ts')).select(
unix_timestamp(col("ts")).alias("timestamp_1"),
unix_timestamp(col("ts"),"MM-dd-yyyy HH:mm:ss").alias("timestamp_2"),
unix_timestamp(col("ts"),"MM-dd-yyyy").alias("timestamp_3"),
unix_timestamp('ts').alias("start_ts")
).show()
Upvotes: 2