Reputation: 39
I have a pyspark dataframe that has a field, time, that has timestamps in two formats,
"11-04-2019,00:32:13" and "2019-12-05T07:57:16.000Z"
How can I convert all the timestamp to the second format, which is the iso time format that matches this format?
%Y-%m-%dT%H:%M:%S.%fZ
So ideally values like "11-04-2019,00:32:13" should be "2019-11-04T00:32:13.000Z"
Upvotes: 1
Views: 8746
Reputation: 42332
You can cast the timestamps to timestamp type first, then use date_format
to convert them back to the desired format:
df.show(truncate=False)
+------------------------+
|time |
+------------------------+
|11-04-2019 00:32:13 |
|2019-12-05T07:57:16.000Z|
+------------------------+
import pyspark.sql.functions as F
df2 = df.withColumn(
'time',
F.date_format(
F.coalesce(
F.to_timestamp('time'),
F.to_timestamp('time', 'MM-dd-yyyy HH:mm:ss')
),
"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"
)
)
df2.show(truncate=False)
+------------------------+
|time |
+------------------------+
|2019-11-04T00:32:13.000Z|
|2019-12-05T07:57:16.000Z|
+------------------------+
Upvotes: 9