Reputation: 2457
I have a dataframe in Spark which contains Unix(Epoch) time and also timezone name. I hope to convert the epochtime to local time according to different tz name. Here is how my data looks like:
data = [
(1420088400, 'America/New_York'),
(1420088400, 'America/Los_Angeles'),
(1510401180, 'America/New_York'),
(1510401180, 'America/Los_Angeles')]
df = spark.createDataFrame(data, ["epoch_time", "tz_name"])
df.createOrReplaceTempView("df")
df1 = spark.sql("""select *, from_unixtime(epoch_time) as gmt_time,"
from_utc_timestamp(from_unixtime(epoch_time), tz_name) as local_time"
from df""")
df1.show(truncate= False)
Here is the result:
+----------+-------------------+-------------------+---------------------+
|epoch_time|tz_name |gmt_time |local_time |
+----------+-------------------+-------------------+---------------------+
|1420088400|America/New_York |2015-01-01 05:00:00|2015-01-01 00:00:00.0|
|1420088400|America/Los_Angeles|2015-01-01 05:00:00|2014-12-31 21:00:00.0|
|1510401180|America/New_York |2017-11-11 11:53:00|2017-11-11 06:53:00.0|
|1510401180|America/Los_Angeles|2017-11-11 11:53:00|2017-11-11 03:53:00.0|
+----------+-------------------+-------------------+---------------------+
Should I first change epochtime to time string using from_unixtime, then change it to utc timestamp using to_utc_timestamp, finally change this UTC timestamp to local time with tz_name? Tried this but got error
df2 = spark.sql("""select *, from_unixtime(epoch_time) as gmt_time,
from_utc_timestamp(from_unixtime(epoch_time), tz_name) as local_time,
from_utc_timestamp(to_utc_timestamp(from_unixtime(epoch_time),from_unixtime(unix_timestamp(), 'z')), tz_name) as newtime from df""")
How could I check my EMR server timezone?
Tried use , is this the server timezone?
spark.sql("select from_unixtime(unix_timestamp(), 'z')").show()
which gave me:
+--------------------------------------------------------------------------+
|from_unixtime(unix_timestamp(current_timestamp(), yyyy-MM-dd HH:mm:ss), z)|
+--------------------------------------------------------------------------+
| UTC|
+--------------------------------------------------------------------------+
Thank you for your clarification.
Upvotes: 3
Views: 7700
Reputation: 4197
When you call from_unixtime
it will format the date based on your Java runtime's timezone, since it's just using the default timezone for SimpleDateFormat
here. In your case it's UTC. So when you convert the values to local time you would only need to call from_utc_timestamp
with the tz_name
value passed in. However if you were to change your system timezone then you would need to call to_utc_timestamp
first.
Spark 2.2 introduces a timezone setting so you can set the timezone for your SparkSession like so
spark.conf.set("spark.sql.session.timeZone", "GMT")
In which case the time functions will use GMT
vs your system timezone, see source here
Upvotes: 5