Reputation: 616
Using Spark 1.6.2 and trying to find if a field contains empty string or a date value.
Spark documentation explains that if unix_timestamp() function returns null if it fails so below behaviour is expected:
sqlContext.sql("select unix_timestamp('')").show
+----+
| _c0|
+----+
|null|
+----+
But when I try to check it with "is null" it returns false:
sqlContext.sql("select unix_timestamp('') is null").show
+-----+
| _c0|
+-----+
|false|
+-----+
Same query returns true in Hive:
hive> select unix_timestamp('') is null;
OK
true
Here is null checking on null for the sake of completeness:
sqlContext.sql("select null is null").show
+----+
| _c0|
+----+
|true|
+----+
Upvotes: 2
Views: 335
Reputation: 330283
This is a bug and it looks like it has been already resolved in Spark 2.x branch (probably with SPARK-12054) The source of the problem is schema returned by the unix_timestamp
. If you execute:
sqlContext.sql("select unix_timestamp('')").printSchema
you'll see:
root
|-- _c0: long (nullable = false)
Since schema is reported as not nullable
values are not checked and and unix_timestamp(...).isNull
is always false.
Upvotes: 5