Reputation: 1164
I'm trying to compare timestamps within a map, but Spark seems to be using a different timezone or something else that is really weird. I read a dummy csv file like the following to build the input dataframe :
"ts"
"1970-01-01 00:00:00"
"1970-01-01 00:00:00"
df.show(2)
+-------------------+
| ts |
+-------------------+
|1970-01-01 00:00:00|
|1970-01-01 00:00:00|
+-------------------+
For now, nothing to report, but then :
df.rdd.map { row =>
val timestamp = row.getTimestamp(0)
val timestampMilli = timestamp.toInstant.toEpochMilli
val epoch = Timestamp.from(Instant.EPOCH)
val epochMilli = epoch.toInstant.toEpochMilli
(timestamp, timestampMilli, epoch, epochMilli)
}.foreach(println)
(1970-01-01 00:00:00.0,-3600000,1970-01-01 01:00:00.0,0)
(1970-01-01 00:00:00.0,-3600000,1970-01-01 01:00:00.0,0)
I don't understand why both timestamp are not 1970-01-01 00:00:00.0, 0
. Anyone know what I'm missing ?
NB : I already setup the session timezone to UTC, using the following properties.
spark.sql.session.timeZone=UTC
user.timezone=UTC
Upvotes: 3
Views: 1200
Reputation: 241450
The java.sql.Timestamp
class inherits from java.util.Date
. They both have the behavior of storing a UTC-based numeric timestamp, but displaying time in the local time zone. You'd see this with .toString()
in Java, the same as you're seeing with println
in the code given.
I believe your OS (or environment) is set to something similar to Europe/London
. Keep in mind that at the Unix epoch (1970-01-01T00:00:00Z
), London was on BST (UTC+1).
Your timestampMilli
variable is showing -3600000
because it's interpreted your input in local time as 1970-01-01T00:00:00+01:00
, which is equivalent to 1969-12-31T23:00:00Z
.
Your epoch
variable is showing 1970-01-01 01:00:00.0
because 0
is equivalent to 1970-01-01T00:00:00Z
, which is equivalent to 1970-01-01T01:00:00+01:00
.
See also:
I do see you noted you set your session time zone to UTC, which in theory should work. But clearly the results are showing that it isn't using that. Sorry, but I don't know Spark well enough to tell you why. But I would focus on that part of the problem.
Upvotes: 4