Reputation: 13773
Code I am currently using:
import pyspark.sql.functions as F
F.when((df.UPDAT_DT.cast("long") - df.CREAT_DT.cast("long")) >= 0,
df.UPDAT_DT).otherwise(df.CREAT_DT).alias('DT')
UPDAT_DT
and CREAT_DT
are timestamp columns
I started with datediff
but I wanted to check at second level.
Is there any other better way to do this?
Upvotes: 1
Views: 52
Reputation: 28422
Sicne both columns are of timestamp types you should be able to directly use <=
and >=
directly, there is no need to convert them.
In other words, you can do:
F.when(df.UPDAT_DT >= df.CREAT_DT, df.UPDAT_DT).otherwise(df.CREAT_DT).alias('DT')
You can also use the greatest
function since you just want the max value:
F.greatest(df.CREAT_DT, df.UPDAT_DT).alias('DT')
Upvotes: 2