Reputation: 301
for certain strings - unix_timestamp return null
raw_data.select(F.unix_timestamp(F.lit("2019-03-10T02:56:36Z"),format=date_format)).show(1)
+--------------------------------------------------------------+
|unix_timestamp(2019-03-10T02:56:36Z, yyyy-MM-dd'T'HH:mm:ss'Z')|
+--------------------------------------------------------------+
| null|
+--------------------------------------------------------------+
only showing top 1 row
but for almost the same string - i do get the answer:
+---------------------------------------------------------------------------------------------+
|unix_timestamp(to_utc_timestamp(2019-03-10T02:56:36Z, America/New_York), yyyy-MM-dd HH:mm:ss)|
+---------------------------------------------------------------------------------------------+
| 1552204596|
+---------------------------------------------------------------------------------------------+
only showing top 1 row
when i first convert the problematic string to UTC timestamp - it works....
raw_data.select(F.unix_timestamp(F.to_utc_timestamp(F.lit("2019-03-10T02:56:36Z"), "America/New_York"))).show(1)
+---------------------------------------------------------------------------------------------+
|unix_timestamp(to_utc_timestamp(2019-03-10T02:56:36Z, America/New_York), yyyy-MM-dd HH:mm:ss)|
+---------------------------------------------------------------------------------------------+
| 1552204596|
+---------------------------------------------------------------------------------------------+
only showing top 1 row
is it a problem in converting string with unix_timestamp ? how can i avoid converting to UTC ?
Upvotes: 1
Views: 816
Reputation: 31490
Issue is with the timestamp 2019-03-10T02:56:36Z
as daylight saving time starts on march 10 at 2AM.
On daylight savings time clock will be set to 3AM
skipping 2AM
entirely, So there is no 02:56
time recorded and that's the reason for resulting null
.
When to use to_utc_timestamp
you are converting timestamp to UTC i.e +5:00
and then time becomes 07:56
which is valid results unix_time without null
.
To Fix the issue use regexp_replace
inside unix_timestamp
to replace 2019-03-10T02
to 2019-03-10T03
.
Example:
#invalid time
spark.sql("""select unix_timestamp('2019-03-10T02:56:36Z', "yyyy-MM-dd'T'HH:mm:ss'Z'") unix_time""").show()
#+---------+
#|unix_time|
#+---------+
#| null|
#+---------+
#valid time
spark.sql("""select unix_timestamp('2019-03-10T03:56:36Z', "yyyy-MM-dd'T'HH:mm:ss'Z'") unix_time""").show()
#+----------+
#| unix_time|
#+----------+
#|1552208196|
#+----------+
#to fix the issue use regexp replace to replace 2019-03-10T02 to 2019-03-10T03
spark.sql("""select unix_timestamp(regexp_replace('2019-03-10T02:56:36Z','2019-03-10T02','2019-03-10T03'), "yyyy-MM-
+----------+
| unix_time|
+----------+
|1552208196|
+----------+
Upvotes: 1