oluies
oluies

Reputation: 17831

Spark Scala dataframe division and IF

I am trying to avoid division by zero in this calculation (value-(mean/stddev)

This is what I came up with:

%sql 
select id,cykle,
s2-(a2/IF(sd2 == 0.0, 0.00000001, sd2)) as std2,
s3-(a3/IF(sd3 == 0.0, 0.00000001, sd3)) as std3,
s4-(a4/IF(sd4 == 0.0, 0.00000001, sd4)) as std4
from XXX where id = 1 order by cykle

Two questions - better way? - can I use division and IF in scala code this does not work :

($"s1" -  ("$a1"  / IF("$sd1"  == lit(0.0),lit(0.00000001),"$sd1"))).as("std1")

Upvotes: 1

Views: 2335

Answers (2)

Martin Tapp
Martin Tapp

Reputation: 3376

I found the following form more compact:

select(..., 'a2 / coalesce('sd, lit(c)), ...)

Upvotes: 0

zero323
zero323

Reputation: 330083

You can either coalesce:

COALESCE(a2 / sd, a2 / 0.00000001)
val c: Double = ???
coalesce($"a2" / $"sd", $"a2" / lit(c))

or use CASE ... WHEN construct:

CASE sd
    WHEN 0 THEN a2 / 0.00000001
    ELSE a2 / sd
END
when($"sd" === 0, $"a2" / lit(c)).otherwise($"a2" / $"sd")

although if your goal is scaling then keeping defaulting to 0 would make more sense.

Upvotes: 4

Related Questions