Reputation: 475
+-------------------+------+---+
| timestamp|status|msg|
+-------------------+------+---+
|2017-01-01 06:15:00| ASC_a| 1|
|2017-01-01 06:19:00| start| 2|
|2017-01-01 06:22:00| ASC_b| 2|
|2017-01-01 06:30:00| null| 2|
|2017-01-01 10:45:00| ASC_a| 3|
|2017-01-01 10:50:00| null| 3|
|2017-01-01 11:22:00| ASC_c| 4|
|2017-01-01 11:31:00| end| 5|
+-------------------+------+---+
ds.where($"status" =!= "ASC_b").show
+-------------------+------+---+
| timestamp|status|msg|
+-------------------+------+---+
|2017-01-01 06:15:00| ASC_a| 1|
|2017-01-01 06:19:00| start| 2|
|2017-01-01 10:45:00| ASC_a| 3|
|2017-01-01 11:22:00| ASC_c| 4|
|2017-01-01 11:31:00| end| 5|
+-------------------+------+---+
I have all those null values because I did a leftouter join of two datasets.
I want the output to include the null values.
Is there a way to accomplish this?
Upvotes: 0
Views: 286
Reputation: 41987
you can include the null checking in your filter as
ds.where($"status" =!= "ASC_b" || $"status".isNull).show
Upvotes: 2
Reputation: 35249
Just negate <=>
operator:
import org.apache.spark.sql.functions.not
ds.where(not($"status" <=> "ASC_b")).show
Upvotes: 2