Prasoon
Prasoon

Reputation: 217

How to convert string date into timestamp in pyspark?

I have a date string like '06/21/2021 9:27 AM', and I want to convert it into timestamp type in pyspark. I have tried this approach, along with others, and it always seems to return null.

df = df.select(
  from_unixtime(unix_timestamp('date_string', 'MM/dd/yyyy hh:mm:ss a')).cast(TimestampType())
)

Has anyone successfully converted this string format into a timestamp format?

Upvotes: 2

Views: 1323

Answers (2)

Ric S
Ric S

Reputation: 9247

One option is to use DataFrame APIs with withColumn and to_timestamp function, but before we need to set the timeParserPolicy to LEGACY

import pyspark.sql.functions as F

spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")

df.withColumn('ts', F.to_timestamp('date_string', format='MM/dd/yyyy hh:mm a'))

Example

df = spark.createDataFrame([
  ('06/21/2021 9:27 AM', ),
  ('06/11/2021 9:02 PM', ),
  ('01/28/2021 12:56 AM', )
], ('date_string', ))

df = df.withColumn('ts', F.to_timestamp('date_string', format='MM/dd/yyyy hh:mm a'))

df.show()

+-------------------+-------------------+
|        date_string|                 ts|
+-------------------+-------------------+
| 06/21/2021 9:27 AM|2021-06-21 09:27:00|
| 06/11/2021 9:02 PM|2021-06-11 21:02:00|
|01/28/2021 12:56 AM|2021-01-28 00:56:00|
+-------------------+-------------------+

Upvotes: 0

Wanda
Wanda

Reputation: 153

06/21/2021 9:27 AM doesn't contain the second-of-minute value so you should remove the :ss in the parser format, see this example:

spark.sql("select from_unixtime(unix_timestamp('06/21/2021 9:27 AM', 'MM/dd/yyyy hh:mm a')) ts").show()

+-------------------+
|                 ts|
+-------------------+
|2021-06-21 09:27:00|
+-------------------+

Upvotes: 3

Related Questions