slhsen
slhsen

Reputation: 616

Why "select unix_timestamp('') is null" returns false when "select unix_timestamp('')" returns null?

Using Spark 1.6.2 and trying to find if a field contains empty string or a date value.

Spark documentation explains that if unix_timestamp() function returns null if it fails so below behaviour is expected:

sqlContext.sql("select unix_timestamp('')").show
+----+
| _c0|
+----+
|null|
+----+

But when I try to check it with "is null" it returns false:

sqlContext.sql("select unix_timestamp('') is null").show
+-----+
|  _c0|
+-----+
|false|
+-----+

Same query returns true in Hive:

hive> select unix_timestamp('') is null;
OK
true

Here is null checking on null for the sake of completeness:

sqlContext.sql("select null is null").show
+----+
| _c0|
+----+
|true|
+----+

Upvotes: 2

Views: 335

Answers (1)

zero323
zero323

Reputation: 330283

This is a bug and it looks like it has been already resolved in Spark 2.x branch (probably with SPARK-12054) The source of the problem is schema returned by the unix_timestamp. If you execute:

sqlContext.sql("select unix_timestamp('')").printSchema

you'll see:

root
 |-- _c0: long (nullable = false)

Since schema is reported as not nullable values are not checked and and unix_timestamp(...).isNull is always false.

Upvotes: 5

Related Questions