Reputation: 77
I am writing queries in databricks using sql on views and would like to calculate max of dates of update timestamp column across multiple views. for instance i am joining table a with table b and would like to know max(a.updt_ts,b.updt_ts). since max function can not have more than one columns mentioned, i want to create a function. Any help is greatly appreciated. below is what i have:
CREATE temporary FUNCTION ufnGetMaxDt (@Date1 DATETIME2,@Date2 DATETIME2)
BEGIN
DECLARE @ret DATETIME2
, @MinDt datetime2;
SET @MinDt = cast('1900-01-01' as datetime2);
IF (@Date1) is null SET @Date1 = @MinDt;
IF (@Date2) is null SET @Date2 = @MinDt;
SET @ret = CASE When @Date1 >= @Date2
Then @Date1
else @Date2
END;
IF (@ret IS NULL)
SET @ret = @MinDt; -- Dummy date
RETURN @ret;
END
GO
Upvotes: 1
Views: 3358
Reputation: 14389
You could just use greatest
? eg
SELECT *, GREATEST( date1, date2 ) xmax
FROM tmp
Or put them in an array, explode it and then max that? eg something like this:
%sql
WITH cte AS
(
SELECT *, EXPLODE( ARRAY( date1, date2 ) ) xmax
FROM tmp
)
SELECT MAX( xmax )
FROM cte
Seems a bit excessive when you can just use greatest
though? It's also worth having a read through the list of Spark SQL built-in functions. You don't have to remember them all but at least if you know something is possible it's useful:
https://spark.apache.org/docs/2.3.0/api/sql/index.html
Upvotes: 1