ZhouQuan
ZhouQuan

Reputation: 1067

change the timestamp to UTC format in spark using scala

The question is kind of similar with the problem: Change the timestamp to UTC format in Pyspark

Basically, it is convert timestamp string format ISO8601 with offset to UTC timestamp string(2017-08-01T14:30:00+05:30 -> 2017-08-01T09:00:00+00:00 ) using scala.

I am kind of new to scala/java, I checked spark library which they dont have a way to convert without knowing the timezone, which I dont have a idea of timezone unless (I parse it in ugly way or using java/scala lib?) Can someone help?

UPDATE: The better way to do this: setup timezone session in spark, and use df.cast(DataTypes.TimestampType) to do the timezone shift

Upvotes: 1

Views: 14126

Answers (2)

user9924540
user9924540

Reputation: 31

org.apache.spark.sql.functions.to_utc_timestamp:

def to_utc_timestamp(ts: Column, tz: String): Column

Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield '2017-07-14 01:40:00.0'.

Upvotes: 3

Jasper-M
Jasper-M

Reputation: 15086

You can use the java.time primitives to parse and convert your timestamp.

scala> import java.time.{OffsetDateTime, ZoneOffset}
import java.time.{OffsetDateTime, ZoneOffset}

scala> val datetime = "2017-08-01T14:30:00+05:30"
datetime: String = 2017-08-01T14:30:00+05:30

scala> OffsetDateTime.parse(datetime).withOffsetSameInstant(ZoneOffset.UTC)
res44: java.time.OffsetDateTime = 2017-08-01T09:00Z

Upvotes: 1

Related Questions