Dev
Dev

Reputation: 13773

Selecting larger value in 2 timestamp columns in pyspark

Code I am currently using:

import pyspark.sql.functions as F

F.when((df.UPDAT_DT.cast("long") - df.CREAT_DT.cast("long")) >= 0,
                           df.UPDAT_DT).otherwise(df.CREAT_DT).alias('DT')

UPDAT_DT and CREAT_DT are timestamp columns

I started with datediff but I wanted to check at second level.

Is there any other better way to do this?

Upvotes: 1

Views: 52

Answers (1)

Shaido
Shaido

Reputation: 28422

Sicne both columns are of timestamp types you should be able to directly use <= and >= directly, there is no need to convert them.

In other words, you can do:

F.when(df.UPDAT_DT >= df.CREAT_DT, df.UPDAT_DT).otherwise(df.CREAT_DT).alias('DT')

You can also use the greatest function since you just want the max value:

F.greatest(df.CREAT_DT, df.UPDAT_DT).alias('DT')

Upvotes: 2

Related Questions