Reputation: 53
The spark filter function is filtering null value when it shouldn't. My condition $"test" =!= "T"
should not remove null row.
val seq = Seq((null, "T"),(null, "F"),(null, "F"),("F", "F"),("T", "C"))
import spark.implicits._
val df = spark.sparkContext.parallelize(seq).toDF("test","bala")
df.show()
df.filter($"test" =!= "T").show()
Upvotes: 0
Views: 1320
Reputation: 42422
Comparison of null with anything will return null, which is cast as False in the filter. To get around this, you can use !eqNullSafe
, e.g.
df.filter(!$"test".eqNullSafe("T")).show
+----+----+
|test|bala|
+----+----+
|null| T|
|null| F|
|null| F|
| F| F|
+----+----+
Upvotes: 3