Reputation: 27383
Consider the following Spark 2.1 code:
val df = Seq("Raphael").toDF("name")
df.show()
+-------+
| name|
+-------+
|Raphael|
+-------+
val squareUDF = udf((d:Double) => Math.pow(d,2))
df.select(squareUDF($"name")).show
+---------+
|UDF(name)|
+---------+
| null|
+---------+
Why do I get null
? I was expecting something like a ClassCastException
because I try to map a String on a Scala Double
"Raphael".asInstanceOf[Double]
java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Double
Upvotes: 2
Views: 873
Reputation: 35249
It is easy to figure out if you check the execution plan:
scala> df.select(squareUDF($"name")).explain(true)
== Parsed Logical Plan ==
'Project [UDF('name) AS UDF(name)#51]
+- AnalysisBarrier Project [value#36 AS name#38]
== Analyzed Logical Plan ==
UDF(name): double
Project [if (isnull(cast(name#38 as double))) null else UDF(cast(name#38 as double)) AS UDF(name)#51]
+- Project [value#36 AS name#38]
+- LocalRelation [value#36]
== Optimized Logical Plan ==
LocalRelation [UDF(name)#51]
== Physical Plan ==
LocalTableScan [UDF(name)#51]
As you can see Spark performs type casting before applying UDF:
UDF(cast(name#38 as double))
and SQL casts does't throw exceptions for type compatible casts. If actual cast is impossible the value is undefined (NULL
). If types were incompatible:
Seq((1, ("Raphael", 42))).toDF("id", "name_number").select(squareUDF($"name_number"))
// org.apache.spark.sql.AnalysisException: cannot resolve 'UDF(name_number)' due to data type mismatch: argument 1 requires double type, however, '`name_number`' is of struct<_1:string,_2:int> type.;;
//
// at org.apache...
you'd get an exception.
If types where incompatible
The rest is covered by:
if (isnull(cast(name#38 as double))) null
Since value is null
udf is never called.
Upvotes: 3