Reputation: 1067
The question is kind of similar with the problem: Change the timestamp to UTC format in Pyspark
Basically, it is convert timestamp string format ISO8601 with offset to UTC timestamp string(2017-08-01T14:30:00+05:30
-> 2017-08-01T09:00:00+00:00
) using scala.
I am kind of new to scala/java, I checked spark library which they dont have a way to convert without knowing the timezone, which I dont have a idea of timezone unless (I parse it in ugly way or using java/scala lib?) Can someone help?
UPDATE: The better way to do this: setup timezone session in spark, and use df.cast(DataTypes.TimestampType)
to do the timezone shift
Upvotes: 1
Views: 14126
Reputation: 31
org.apache.spark.sql.functions.to_utc_timestamp
:
def to_utc_timestamp(ts: Column, tz: String): Column
Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield '2017-07-14 01:40:00.0'.
Upvotes: 3
Reputation: 15086
You can use the java.time
primitives to parse and convert your timestamp.
scala> import java.time.{OffsetDateTime, ZoneOffset}
import java.time.{OffsetDateTime, ZoneOffset}
scala> val datetime = "2017-08-01T14:30:00+05:30"
datetime: String = 2017-08-01T14:30:00+05:30
scala> OffsetDateTime.parse(datetime).withOffsetSameInstant(ZoneOffset.UTC)
res44: java.time.OffsetDateTime = 2017-08-01T09:00Z
Upvotes: 1