Baptiste Moisson
Baptiste Moisson

Reputation: 53

Spark filter function is filtering null value when it is not expected

The spark filter function is filtering null value when it shouldn't. My condition $"test" =!= "T" should not remove null row.

val seq = Seq((null, "T"),(null, "F"),(null, "F"),("F", "F"),("T", "C"))

import spark.implicits._

val df = spark.sparkContext.parallelize(seq).toDF("test","bala")
    
df.show()
    
df.filter($"test" =!= "T").show()

Upvotes: 0

Views: 1320

Answers (1)

mck
mck

Reputation: 42422

Comparison of null with anything will return null, which is cast as False in the filter. To get around this, you can use !eqNullSafe, e.g.

df.filter(!$"test".eqNullSafe("T")).show
+----+----+
|test|bala|
+----+----+
|null|   T|
|null|   F|
|null|   F|
|   F|   F|
+----+----+

Upvotes: 3

Related Questions