novice8989
novice8989

Reputation: 169

Compare timestamp columns - Spark scala

I am trying to compare 2 timestamp columns and return the value which is minimum value among the two and wondering is there a better way than what I have. Note both columns might have values / one of them has value and other could be null. I know about When.otherwise.

import org.apache.spark.sql.functions._
import java.sql.Timestamp
 val compareTime = udf((t1: Timestamp, t2: Timestamp) => {
    if(t1 != null && t2 != null && t1.before(t2) ){
     Some(t1)
    }
    else if(t1 != null && t2 != null && t2.before(t1)){
     Some(t2)
    } 
    else if(t1 != null){
     Some(t1)
    } 
     else if(t2 != null){
     Some(t2)
    } 
    else {
       None
    }
  })
 

var df = Seq((1L, "2021-01-04 16:10:00","2021-01-04 15:20:00")).toDF("id","t1","t2")
df = (df.withColumn("t1",to_timestamp($"t1","yyyy-MM-dd HH:mm:ss"))
      .withColumn("t2",to_timestamp($"t2","yyyy-MM-dd HH:mm:ss")))

df = df.withColumn("t3",compareTime($"t1",$"t2"))
df.show()

Upvotes: 0

Views: 952

Answers (2)

mck
mck

Reputation: 42352

UDF is probably unnecessary here - you can use the Spark SQL function least:

var df = Seq((1L, "2021-01-04 16:10:00","2021-01-04 15:20:00")).toDF("id","t1","t2")
df = (df.withColumn("t1",to_timestamp($"t1","yyyy-MM-dd HH:mm:ss"))
      .withColumn("t2",to_timestamp($"t2","yyyy-MM-dd HH:mm:ss")))

df = df.withColumn("t3",least($"t1",$"t2"))
df.show()
+---+-------------------+-------------------+-------------------+
| id|                 t1|                 t2|                 t3|
+---+-------------------+-------------------+-------------------+
|  1|2021-01-04 16:10:00|2021-01-04 15:20:00|2021-01-04 15:20:00|
+---+-------------------+-------------------+-------------------+

The opposite of least is greatest, if you want to get the larger one of the two columns.

Note that both least and greatest will ignore null values, but they will return null if all input columns are null.

Upvotes: 1

jwvh
jwvh

Reputation: 51271

Try this:

(Option(t1) ++ Option(t2)).minOption

It should do the same job as your if..else if..else stack.


Oops. My bad. Spark doesn't do Scala 2.13.x. Try this instead:

util.Try((Option(t1) ++ Option(t2)).minBy(_.getTime())).toOption

Upvotes: 1

Related Questions