Reputation: 191
I have a dataframe(df) with following:
+---------+--------------------+
| col1| col2 |
+---------+--------------------+
|colvalue1| NULL|
|colvalue2|col2value... |
+------------+-----------------+
I am trying to filter rows based on the col2 as follows
df.filter(($"col2".isNotNULL) || ($"col2" !== "NULL") || ($"col2" !== "null") || ($"col2".trim !== "NULL"))
But the row which has NULL is not filtering. This column show nullable=true
.
Can anyone let me know what mistake I am doing? I am using Spark 1.6.
Upvotes: 1
Views: 12793
Reputation: 41957
Your !==
notation is wrong which should be =!=
, and you can't do $"col2".trim
and since you have used negations with ||
, one of the negation is always true. In your example ($"col2".isNotNULL)
is always true so every rows are filtered-in. So individual negation combined by ||
should be taken with care.
So the correct form is
df.filter(!($"col2".isNull || ($"col2" === "NULL") || ($"col2" === "null")))
or even better if you use inbuilt function isnull
and trim
df.filter(!(isnull($"col2") || (trim($"col2") === "NULL") || (trim($"col2") === "null")))
Upvotes: 6