Reputation: 2201
I'm trying to count empty values in column in DataFrame like this:
df.filter((df(colname) === null) || (df(colname) === "")).count()
In colname
there is a name of the column. This works fine if column type is string but if column type is integer and there are some nulls this code always returns 0. Why is this so? How to change it to make it work?
Upvotes: 3
Views: 16045
Reputation: 41957
As mentioned on the question that df.filter((df(colname) === null) || (df(colname) === "")).count()
works for String
data types but the testing shows that null
are not handled.
@Psidom's answer handles both null
and empty
but does not handle for NaN
.
checking for .isNaN
should handle all three cases
df.filter(df(colName).isNull || df(colName) === "" || df(colName).isNaN).count()
Upvotes: 5
Reputation: 214927
You can use isNull
to test the null
condition:
val df = Seq((Some("a"), Some(1)), (null, null), (Some(""), Some(2))).toDF("A", "B")
// df: org.apache.spark.sql.DataFrame = [A: string, B: int]
df.filter(df("A").isNull || df("A") === "").count
// res7: Long = 2
df.filter(df("B").isNull || df("B") === "").count
// res8: Long = 1
Upvotes: 2