abhinee S
abhinee S

Reputation: 39

Spark 3.0 UTC to AKST conversion fails with ZoneRulesException: Unknown time-zone ID

I am not able to convert timestamps in UTC to AKST timezone in Spark 3.0. The same works in Spark 2.4. All other conversions work (to EST, PST, MST etc).

Appreciate any inputs on how to fix this error.

Below command:

spark.sql("select from_utc_timestamp('2020-10-01 11:12:30', 'AKST')").show

returns the error:

java.time.zone.ZoneRulesException: Unknown time-zone ID: AKST

Detailed log:

java.time.zone.ZoneRulesException: Unknown time-zone ID: AKST
  at java.time.zone.ZoneRulesProvider.getProvider(ZoneRulesProvider.java:272)
  at java.time.zone.ZoneRulesProvider.getRules(ZoneRulesProvider.java:227)
  at java.time.ZoneRegion.ofId(ZoneRegion.java:120)
  at java.time.ZoneId.of(ZoneId.java:411)
  at java.time.ZoneId.of(ZoneId.java:359)
  at java.time.ZoneId.of(ZoneId.java:315)
  at org.apache.spark.sql.catalyst.util.DateTimeUtils$.getZoneId(DateTimeUtils.scala:62)
  at org.apache.spark.sql.catalyst.util.DateTimeUtils$.fromUTCTime(DateTimeUtils.scala:833)
  at org.apache.spark.sql.catalyst.expressions.FromUTCTimestamp.nullSafeEval(datetimeExpressions.scala:1299)
  at org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:552)
  at org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:457)
  at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:52)
  at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:45)
  at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:321)
  at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:321)
  at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:326)
  at org.apache.spark.sql.catalyst.trees.TreeNode.applyFunctionIfChanged$1(TreeNode.scala:380)
  at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:416)
  at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:248)
  at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:414)
  at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:362)
  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:326)
  at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsDown$1(QueryPlan.scala:96)
  at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:118)
  at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
  at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:118)
  at org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:129)
  at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:134)
  at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
  at scala.collection.immutable.List.foreach(List.scala:392)
  at scala.collection.TraversableLike.map(TraversableLike.scala:238)
  at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
  at scala.collection.immutable.List.map(List.scala:298)
  at org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:134)
  at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:139)
  at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:248)
  at org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:139)
  at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsDown(QueryPlan.scala:96)
  at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1.applyOrElse(expressions.scala:45)
  at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1.applyOrElse(expressions.scala:44)
  at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:321)
  at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:321)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDown(LogicalPlan.scala:29)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:149)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:147)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
  at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:326)
  at org.apache.spark.sql.catalyst.trees.TreeNode.applyFunctionIfChanged$1(TreeNode.scala:380)
  at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:416)
  at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:248)
  at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:414)
  at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:362)
  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:326)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDown(LogicalPlan.scala:29)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:149)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:147)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
  at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:326)
  at org.apache.spark.sql.catalyst.trees.TreeNode.applyFunctionIfChanged$1(TreeNode.scala:380)
  at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:416)
  at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:248)
  at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:414)
  at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:362)
  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:326)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDown(LogicalPlan.scala:29)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:149)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:147)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
  at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:310)
  at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$.apply(expressions.scala:44)
  at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$.apply(expressions.scala:43)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:149)
  at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
  at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
  at scala.collection.immutable.List.foldLeft(List.scala:89)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:146)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:138)
  at scala.collection.immutable.List.foreach(List.scala:392)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:138)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:116)
  at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:98)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:116)
  at org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:82)
  at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:121)
  at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:153)
  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
  at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:153)
  at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:82)
  at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:79)
  at org.apache.spark.sql.execution.QueryExecution.$anonfun$writePlans$4(QueryExecution.scala:217)
  at org.apache.spark.sql.catalyst.plans.QueryPlan$.append(QueryPlan.scala:381)
  at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$writePlans(QueryExecution.scala:217)
  at org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:227)
  at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:96)
  at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:207)
  at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:88)
  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3653)
  at org.apache.spark.sql.Dataset.head(Dataset.scala:2737)
  at org.apache.spark.sql.Dataset.take(Dataset.scala:2944)
  at org.apache.spark.sql.Dataset.getRows(Dataset.scala:301)
  at org.apache.spark.sql.Dataset.showString(Dataset.scala:338)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:864)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:823)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:832)
  ... 47 elided

Upvotes: 2

Views: 1408

Answers (2)

blackbishop
blackbishop

Reputation: 32650

Adding further to mck's answer. You are using the old Java data-time API zone's short IDs. According to this Databricks blog post A Comprehensive Look at Dates and Timestamps in Apache Spark™ 3.0, Spark migrated to the new API since version 3.0:

Since Java 8, the JDK has exposed a new API for date-time manipulation and time zone offset resolution, and Spark migrated to this new API in version 3.0. Although the mapping of time zone names to offsets has the same source, IANA TZDB, it is implemented differently in Java 8 and higher versus Java 7.

You can verify it by opening spark-shell and list available zones like this:

import java.time.ZoneId
import scala.collection.JavaConverters._

ZoneId.SHORT_IDS.asScala.keys

//res0: Iterable[String] = Set(CTT, ART, CNT, PRT, PNT, PLT, AST, BST, CST, EST, HST, JST, IST, AGT, NST, MST, AET, BET, PST, ACT, SST, VST, CAT, ECT, EAT, IET, MIT, NET)

That said, you should not use abbreviations when you specify timezones, instead use area/city format. See Which three-letter time zone IDs are not deprecated?

Upvotes: 2

mck
mck

Reputation: 42342

Seems it can't understand AKST, but Spark 3 seems to understand America/Anchorage, which I suppose to have the timezone AKST:

spark.sql("select from_utc_timestamp('2020-10-01 11:12:30', 'America/Anchorage')").show

Upvotes: 1

Related Questions