Reputation: 382
I have distorted Data,
I am using below function here.
to_timestamp("col","yyyy-MM-dd'T'hh:mm:ss.SSS'Z'")
Data:
time | OUTPUT | IDEAL
2022-06-16T07:01:25.346Z | 2022-06-16T07:01:25.346+0000 | 2022-06-16T07:01:25.346+0000
2022-06-16T06:54:21.51Z | 2022-06-16T06:54:21.051+0000 | 2022-06-16T06:54:21.510+0000
2022-06-16T06:54:21.5Z | 2022-06-16T06:54:21.005+0000 | 2022-06-16T06:54:21.500+0000
so, I have S or SS or SSS format for milisecond in data.
How can i normalise it into SSS correct way?
Here, 51 miliseconds mean 510 not 051.
Using spark version : 3.2.1 Code :
import pyspark.sql.functions as F
test = spark.createDataFrame([(1,'2022-06-16T07:01:25.346Z'),(2,'2022-06-16T06:54:21.51Z'),(3,'2022-06-16T06:54:21.5Z')],['no','timing1'])
timeFmt = "yyyy-MM-dd'T'hh:mm:ss.SSS'Z'"
test = test.withColumn("timing2", (F.to_timestamp(F.col('timing1'),format=timeFmt)))
test.select("timing1","timing2").show(truncate=False)
Output:
Upvotes: 1
Views: 382
Reputation: 382
I was using this setting :
spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")
I have to reset this and it is working as normal.
Upvotes: 1
Reputation: 2416
I also use v3.2.1 and it works for me if you just don't parse the timestamp format. It is already in the right format:
from pyspark.sql import functions as F
test = spark.createDataFrame([(1,'2022-06-16T07:01:25.346Z'),(2,'2022-06-16T06:54:21.51Z'),(3,'2022-06-16T06:54:21.5Z')],['no','timing1'])
new_df = test.withColumn('timing1_ts', F.to_timestamp('timing1'))\
new_df.show(truncate=False)
new_df.dtypes
+---+------------------------+-----------------------+
|no |timing1 |timing1_ts |
+---+------------------------+-----------------------+
|1 |2022-06-16T07:01:25.346Z|2022-06-16 07:01:25.346|
|2 |2022-06-16T06:54:21.51Z |2022-06-16 06:54:21.51 |
|3 |2022-06-16T06:54:21.5Z |2022-06-16 06:54:21.5 |
+---+------------------------+-----------------------+
Out[9]: [('no', 'bigint'), ('timing1', 'string'), ('timing1_ts', 'timestamp')]
Upvotes: 1