newleaf
newleaf

Reputation: 2457

change Unix(Epoch) time to local time in pyspark

I have a dataframe in Spark which contains Unix(Epoch) time and also timezone name. I hope to convert the epochtime to local time according to different tz name. Here is how my data looks like:

data = [
    (1420088400, 'America/New_York'),
    (1420088400, 'America/Los_Angeles'),
    (1510401180, 'America/New_York'),
    (1510401180, 'America/Los_Angeles')]

df = spark.createDataFrame(data, ["epoch_time", "tz_name"])

df.createOrReplaceTempView("df")
df1 = spark.sql("""select *, from_unixtime(epoch_time) as gmt_time,"
               from_utc_timestamp(from_unixtime(epoch_time), tz_name) as local_time"
               from df""")
df1.show(truncate= False)

Here is the result:

+----------+-------------------+-------------------+---------------------+
|epoch_time|tz_name            |gmt_time           |local_time           |
+----------+-------------------+-------------------+---------------------+
|1420088400|America/New_York   |2015-01-01 05:00:00|2015-01-01 00:00:00.0|
|1420088400|America/Los_Angeles|2015-01-01 05:00:00|2014-12-31 21:00:00.0|
|1510401180|America/New_York   |2017-11-11 11:53:00|2017-11-11 06:53:00.0|
|1510401180|America/Los_Angeles|2017-11-11 11:53:00|2017-11-11 03:53:00.0|
+----------+-------------------+-------------------+---------------------+
  1. I'm not quite sure if this transfer is right, but it seems the daylight saving has been taking care of.
  2. Should I first change epochtime to time string using from_unixtime, then change it to utc timestamp using to_utc_timestamp, finally change this UTC timestamp to local time with tz_name? Tried this but got error

    df2 = spark.sql("""select *, from_unixtime(epoch_time) as gmt_time,
                       from_utc_timestamp(from_unixtime(epoch_time), tz_name) as local_time,
                       from_utc_timestamp(to_utc_timestamp(from_unixtime(epoch_time),from_unixtime(unix_timestamp(), 'z')), tz_name) as newtime from df""")
    
  3. How could I check my EMR server timezone?

  4. Tried use , is this the server timezone?

    spark.sql("select from_unixtime(unix_timestamp(), 'z')").show()
    

    which gave me:

    +--------------------------------------------------------------------------+
    |from_unixtime(unix_timestamp(current_timestamp(), yyyy-MM-dd HH:mm:ss), z)|
     +--------------------------------------------------------------------------+
    |                                                                       UTC|
    +--------------------------------------------------------------------------+
    

Thank you for your clarification.

Upvotes: 3

Views: 7700

Answers (1)

Silvio
Silvio

Reputation: 4197

When you call from_unixtime it will format the date based on your Java runtime's timezone, since it's just using the default timezone for SimpleDateFormat here. In your case it's UTC. So when you convert the values to local time you would only need to call from_utc_timestamp with the tz_name value passed in. However if you were to change your system timezone then you would need to call to_utc_timestamp first.

Spark 2.2 introduces a timezone setting so you can set the timezone for your SparkSession like so

spark.conf.set("spark.sql.session.timeZone", "GMT")

In which case the time functions will use GMT vs your system timezone, see source here

Upvotes: 5

Related Questions