Reputation: 169
I have simple UDF which returns a value based on the input parameters and if the parameters are empty its not returning the default case . Appreciate any help in correcting my understanding
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
val test = udf((a: Double,b: Double ,c: Boolean) => {
if ((a) >= 6 && !c) {
{
"smith"
}
}
else if ( (a) >= 20 && !c) {
"Fred"
}
else if (( (a) < 6 || (b) < 2) && !c) {
"Ross"
}
else {
"NA"
}
})
val ds1 = Seq((1,"test",true),
(2,"test2",false),
(3,"teste",false)
).toDF("id","name","flag")
val ds2 = Seq((2,6,4),
(3,0,0)
).toDF("id","flag2","flag3")
var combined= (ds1.as("n")
.join(ds2.as("p"), $"n.id" === $"p.id","left_outer")
.select
(
$"n.id",
$"n.name",$"n.flag",$"flag2",$"flag3"
))
combined = combined.withColumn("newcol",test($"flag2",$"flag3",$"flag"))
combined.show(5,false)
For the row with Id value =1, udf should return "NA" as its not meeting any of criteria in the UDF but instead its returning null
Also how can I populate empty /null for flag2 and flag3 columns in ds2 . for eg. tried seq(3,null.asInstanceOf[Double],null.asInstanceOf[Double]),got an error
Upvotes: 1
Views: 1046
Reputation: 18023
For your understanding then:
Scala uses Java primitives
. Double
and Int
primitives in Java must have a value, i.e. null
is not acceptable. The UDF is therefore not invoked in your case for the 1 entry, as it is seen that these are of Double
type - and null, of course in this case. If you understand this, then you should be able to devise a suitable solution.
Upvotes: 1
Reputation: 183
The UDF is failing because of null values and it is not executing. It returns null for those cases. Handle the null values in the combined dataframe. One option is to replace the nulls by 0.
val new_combined = combined.na.fill(0).withColumn("newcol",test($"flag2",$"flag3",$"flag"))
new_combined.show(5,false)
+---+-----+-----+-----+-----+------+
|id |name |flag |flag2|flag3|newcol|
+---+-----+-----+-----+-----+------+
|1 |test |true |0 |0 |NA |
|2 |test2|false|6 |4 |smith |
|3 |teste|false|0 |0 |Ross |
+---+-----+-----+-----+-----+------+
https://docs.databricks.com/spark/latest/spark-sql/udf-scala.html
Upvotes: 1