user2811630
user2811630

Reputation: 475

where ignore null values in dataset

+-------------------+------+---+
|          timestamp|status|msg|
+-------------------+------+---+
|2017-01-01 06:15:00| ASC_a|  1|
|2017-01-01 06:19:00| start|  2|
|2017-01-01 06:22:00| ASC_b|  2|
|2017-01-01 06:30:00|  null|  2|
|2017-01-01 10:45:00| ASC_a|  3|
|2017-01-01 10:50:00|  null|  3|
|2017-01-01 11:22:00| ASC_c|  4|
|2017-01-01 11:31:00|   end|  5|
+-------------------+------+---+

ds.where($"status" =!= "ASC_b").show

+-------------------+------+---+
|          timestamp|status|msg|
+-------------------+------+---+
|2017-01-01 06:15:00| ASC_a|  1|
|2017-01-01 06:19:00| start|  2|
|2017-01-01 10:45:00| ASC_a|  3|
|2017-01-01 11:22:00| ASC_c|  4|
|2017-01-01 11:31:00|   end|  5|
+-------------------+------+---+

I have all those null values because I did a leftouter join of two datasets. I want the output to include the null values.
Is there a way to accomplish this?

Upvotes: 0

Views: 286

Answers (2)

Ramesh Maharjan
Ramesh Maharjan

Reputation: 41987

you can include the null checking in your filter as

ds.where($"status" =!= "ASC_b" || $"status".isNull).show

Upvotes: 2

Alper t. Turker
Alper t. Turker

Reputation: 35249

Just negate <=> operator:

import org.apache.spark.sql.functions.not

ds.where(not($"status" <=> "ASC_b")).show

Upvotes: 2

Related Questions