Reputation: 148
I am using Spark 2.1.0 in unix and found a weird issue where unix_timestamp is changing hour for one particular timestamp, I created a dataframe as below
For 1st record in df2 is having "20170312020200" as String, which I later cast into timestamp in df3, the hours should be 02 but instead it comes as 03 in df3. But 2nd record doesn't have issue in converting string to timestamp.
This doesn't happen when I run the app using Intellij in local system. This is happening in spark-submit as well when we run our app.
Upvotes: 0
Views: 10911
Reputation: 11469
I am using Spark 2 , you can see following results , your issue not related to unix_timestamp or Spark version , please check your data.
import org.apache.spark.sql.functions.unix_timestamp
val df2 = sc.parallelize(Seq(
(10, "date", "20170312020200"), (10, "date", "20170312050200"))
).toDF("id ", "somthing ", "datee")
df2.show()
val df3=df2.withColumn("datee", unix_timestamp($"datee", "yyyyMMddHHmmss").cast("timestamp"))
df3.show()
+---+---------+--------------+
|id |somthing | datee|
+---+---------+--------------+
| 10| date|20170312020200|
| 10| date|20170312050200|
+---+---------+--------------+
+---+---------+-------------------+
|id |somthing | datee|
+---+---------+-------------------+
| 10| date|2017-03-12 02:02:00|
| 10| date|2017-03-12 05:02:00|
+---+---------+-------------------+
import org.apache.spark.sql.functions.unix_timestamp
df2: org.apache.spark.sql.DataFrame = [id : int, somthing : string ... 1 more field]
df3: org.apache.spark.sql.DataFrame = [id : int, somthing : string ... 1 more field]
Upvotes: 1
Reputation: 18434
March 12, 2017 2:02 AM is not a valid time in a lot of time zones. That was when daylight savings kicked in and the clock skipped from 1:59:59 to 3:00:00 in the US.
My guess is your local machine and the spark cluster have different system time zone settings.
Upvotes: 4