Tomasz
Tomasz

Reputation: 610

Spark scala - calculating dynamic timestamp interval

have dataframe with a timestamp column (timestamp type) called "maxTmstmp" and another column with hours, represented as integers called "WindowHours". I would like to dynamically subtract timestamp and integer columns to get lower timestamp.

My data and desired effect ("minTmstmp" column):

+-----------+-------------------+-------------------+
|WindowHours|          maxTmstmp|          minTmstmp|
|           |                   |(maxTmstmp - Hours)|
+-----------+-------------------+-------------------+
|          1|2016-01-01 23:00:00|2016-01-01 22:00:00|
|          2|2016-03-01 12:00:00|2016-03-01 10:00:00|
|          8|2016-03-05 20:00:00|2016-03-05 12:00:00|
|         24|2016-04-12 11:00:00|2016-04-11 11:00:00|
+-----------+-------------------+-------------------+

 root
     |-- WindowHours: integer (nullable = true)
     |-- maxTmstmp: timestamp (nullable = true)

I have already found an expressions with hours interval solution, but it isn't dynamic. Code below doesn't work as intended.

standards.
      .withColumn("minTmstmp", $"maxTmstmp" - expr("INTERVAL 10 HOURS"))
      .show()

Operate on Spark 2.4 and scala.

Upvotes: 1

Views: 1394

Answers (1)

Leo C
Leo C

Reputation: 22439

One simple way would be to convert maxTmstmp to unix time, subtract the value of WindowHours in seconds from it, and convert the result back to Spark Timestamp, as shown below:

import java.sql.Timestamp
import org.apache.spark.sql.functions._
import spark.implicits._

val df = Seq(
  (1, Timestamp.valueOf("2016-01-01 23:00:00")),
  (2, Timestamp.valueOf("2016-03-01 12:00:00")),
  (8, Timestamp.valueOf("2016-03-05 20:00:00")),
  (24, Timestamp.valueOf("2016-04-12 11:00:00"))
).toDF("WindowHours", "maxTmstmp")

df.withColumn("minTmstmp",
    from_unixtime(unix_timestamp($"maxTmstmp") - ($"WindowHours" * 3600))
  ).show
// +-----------+-------------------+-------------------+
// |WindowHours|          maxTmstmp|          minTmstmp|
// +-----------+-------------------+-------------------+
// |          1|2016-01-01 23:00:00|2016-01-01 22:00:00|
// |          2|2016-03-01 12:00:00|2016-03-01 10:00:00|
// |          8|2016-03-05 20:00:00|2016-03-05 12:00:00|
// |         24|2016-04-12 11:00:00|2016-04-11 11:00:00|
// +-----------+-------------------+-------------------+

Upvotes: 4

Related Questions