Reputation: 217
I have a date string like '06/21/2021 9:27 AM'
, and I want to convert it into timestamp type in pyspark.
I have tried this approach, along with others, and it always seems to return null.
df = df.select(
from_unixtime(unix_timestamp('date_string', 'MM/dd/yyyy hh:mm:ss a')).cast(TimestampType())
)
Has anyone successfully converted this string format into a timestamp format?
Upvotes: 2
Views: 1323
Reputation: 9247
One option is to use DataFrame APIs with withColumn
and to_timestamp
function, but before we need to set the timeParserPolicy
to LEGACY
import pyspark.sql.functions as F
spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")
df.withColumn('ts', F.to_timestamp('date_string', format='MM/dd/yyyy hh:mm a'))
df = spark.createDataFrame([
('06/21/2021 9:27 AM', ),
('06/11/2021 9:02 PM', ),
('01/28/2021 12:56 AM', )
], ('date_string', ))
df = df.withColumn('ts', F.to_timestamp('date_string', format='MM/dd/yyyy hh:mm a'))
df.show()
+-------------------+-------------------+
| date_string| ts|
+-------------------+-------------------+
| 06/21/2021 9:27 AM|2021-06-21 09:27:00|
| 06/11/2021 9:02 PM|2021-06-11 21:02:00|
|01/28/2021 12:56 AM|2021-01-28 00:56:00|
+-------------------+-------------------+
Upvotes: 0
Reputation: 153
06/21/2021 9:27 AM
doesn't contain the second-of-minute value so you should remove the :ss
in the parser format, see this example:
spark.sql("select from_unixtime(unix_timestamp('06/21/2021 9:27 AM', 'MM/dd/yyyy hh:mm a')) ts").show()
+-------------------+
| ts|
+-------------------+
|2021-06-21 09:27:00|
+-------------------+
Upvotes: 3