joesan
joesan

Reputation: 15435

Spark DataFrame Get Null Count For All Columns

I have a DataFrame in which I would like to get the total null values count and I have the following that does this generically on all the columns:

First my DataFrame that just contains one column (for simplicity):

val recVacDate = dfRaw.select("STATE")

When I print using a simple filter, I get to see the following:

val filtered = recVacDate.filter("STATE is null")
println(filtered.count()) // Prints 94051

But when I use this code below, I get just 1 as a result and I do not understand why?

val nullCount = recVacDate.select(recVacDate.columns.map(c => count(col(c).isNull || col(c) === "" || col(c).isNaN).alias(c)): _*) 
println(nullCount.count()) // Prints 1

Any ideas as to what is wrong with the nullCount? The DataType of the column is a String.

Upvotes: 1

Views: 1625

Answers (1)

joesan
joesan

Reputation: 15435

This kind of fixed it:

df.select(df.columns.map(c => count(when(col(c).isNull || col(c) === "" || col(c).isNaN, c)).alias(c)): _*)

Notice the use of when clause after the count.

Upvotes: 1

Related Questions