tooptoop4
tooptoop4

Reputation: 330

How to set timezone to UTC in Apache Spark?

In Spark's WebUI (port 8080) and on the environment tab there is a setting of the below:

user.timezone   Zulu

Do you know how/where I can override this to UTC?

Env details:

Upvotes: 16

Views: 61170

Answers (5)

RaHuL VeNuGoPaL
RaHuL VeNuGoPaL

Reputation: 509

You can use below to set the time zone to any zone you want and your notebook or session will keep that value for current_time() or current_timestamp().

%sql
SET TIME ZONE 'America/**Los_Angeles**' - >  To get PST
SET TIME ZONE 'America/**Chicago**';  - >   To get CST 

The last part should be a city , its not allowing all the cities as far as I tried.

Referenece : https://spark.apache.org/docs/latest/sql-ref-syntax-aux-conf-mgmt-set-timezone.html

Upvotes: 0

Daniel
Daniel

Reputation: 1272

Now you can use:

spark.conf.set("spark.sql.session.timeZone", "UTC")

Since https://issues.apache.org/jira/browse/SPARK-18936 in 2.2.0

Additionally, I set my default TimeZone to UTC to avoid implicit conversions

TimeZone.setDefault(TimeZone.getTimeZone("UTC"))

Otherwise you will get implicit conversions from your default Timezone to UTC when no Timezone information is present in the Timestamp you're converting

Example:

val rawJson = """ {"some_date_field": "2018-09-14 16:05:37"} """

val dsRaw = sparkJob.spark.createDataset(Seq(rawJson))

val output =
  dsRaw
    .select(
      from_json(
        col("value"),
        new StructType(
          Array(
            StructField("some_date_field", DataTypes.TimestampType)
          )
        )
      ).as("parsed")
    ).select("parsed.*")

If my default TimeZone is Europe/Dublin which is GMT+1 and Spark sql session timezone is set to UTC, Spark will assume that "2018-09-14 16:05:37" is in Europe/Dublin TimeZone and do a conversion (result will be "2018-09-14 15:05:37")

Upvotes: 48

DieterDP
DieterDP

Reputation: 4347

As described in these SPARK bug reports (link, link), the most current SPARK versions (3.0.0 and 2.4.6 at time of writing) do not fully/correctly support setting the timezone for all operations, despite the answers by @Moemars and @Daniel.

I suggest avoiding time operations in SPARK as much as possible, and either perform them yourself after extraction from SPARK or by using UDFs, as used in this question.

Upvotes: 6

Moemars
Moemars

Reputation: 4862

In some cases you will also want to set the JVM timezone. For example, when loading data into a TimestampType column, it will interpret the string in the local JVM timezone. To set the JVM timezone you will need to add extra JVM options for the driver and executor:

spark = pyspark.sql.SparkSession \
    .Builder()\
    .appName('test') \
    .master('local') \
    .config('spark.driver.extraJavaOptions', '-Duser.timezone=GMT') \
    .config('spark.executor.extraJavaOptions', '-Duser.timezone=GMT') \
    .config('spark.sql.session.timeZone', 'UTC') \
    .getOrCreate()

We do this in our local unit test environment, since our local time is not GMT.

Useful reference: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones

Upvotes: 29

Prashant Patel
Prashant Patel

Reputation: 177

Change your system timezone and check it I hope it will works

Upvotes: -28

Related Questions