Spark Scala dataframe division and IF

Question

I am trying to avoid division by zero in this calculation (value-(mean/stddev)

This is what I came up with:

%sql 
select id,cykle,
s2-(a2/IF(sd2 == 0.0, 0.00000001, sd2)) as std2,
s3-(a3/IF(sd3 == 0.0, 0.00000001, sd3)) as std3,
s4-(a4/IF(sd4 == 0.0, 0.00000001, sd4)) as std4
from XXX where id = 1 order by cykle

Two questions - better way? - can I use division and IF in scala code this does not work :

($"s1" -  ("$a1"  / IF("$sd1"  == lit(0.0),lit(0.00000001),"$sd1"))).as("std1")

zero323 · Accepted Answer

You can either coalesce:

COALESCE(a2 / sd, a2 / 0.00000001)

val c: Double = ???
coalesce($"a2" / $"sd", $"a2" / lit(c))

or use CASE ... WHEN construct:

CASE sd
    WHEN 0 THEN a2 / 0.00000001
    ELSE a2 / sd
END

when($"sd" === 0, $"a2" / lit(c)).otherwise($"a2" / $"sd")

although if your goal is scaling then keeping defaulting to 0 would make more sense.

Spark Scala dataframe division and IF

Answers (2)

Related Questions