Shree Batale
Shree Batale

Reputation: 237

how to take count of null values from table using spark-scala?

I have table name "data" which having 5 columns and each column contain some null values. i want to take a count of each column's null value how can i write code for that result! its easy to take count of one column but how can i write code for counting each column of table.

sample :

+----------------+----------------+--------+---------+-------------+
| 2              |3               |4       |  5      |6            |
+----------------+----------------+--------+---------+-------------+
|null             |1               | null   |null     |null         |
|null             |null            | null   |null     |asdc         |
|null             |23              | 23     |null     |null         |
|null             |null            | null   |23       |41           |
|24               |3               | 35     |null     |null         |
|null             |null            | null   | 1       |wef          |
|null             |32              | 54     |null     |45           |
|null             |null            | null   |123      |null         |
|w411             |31              | 12     |null     |null         |
|null             |null            | null   |11       |null         |
+----------------+----------------+--------+---------+-------------+

how take null count of each column

I have 40 tables which contain 5 or 6 or 10 columns and each column contain some null values i just want to take null count of each column of tables which is the best way to take null count!

Thanks in advance!

Upvotes: 1

Views: 2378

Answers (2)

Alexey Popkov
Alexey Popkov

Reputation: 9425

okay, but what if i have 100 of columns then how can i take null count? – Shree Batale

val myTableDF = Seq(
  (1, 100, 0, 0, 0, 0, 0),
  (2, 0, 50, 0, 0, 20, 0),
  (3, 0, 0, 0, 0, 0, 0),
  (4, 0, 0, 0, 0, 0, 0)
).toDF("column1", "column2", "column3", "column4", "column5", "column6", "column7")

table



val inputDF = myTableDF

println("Total " + inputDF.count() + " rows in the input DataFrame\n")

val countsDF = inputDF.select(inputDF.columns.map(c => count(when(col(c).isNull or col(c) === 0, c)).alias(c)): _*)
            .withColumn("pivot", lit("Nulls and 0s count"))
            .cache()

counts




val kv = explode(array(countsDF.columns.dropRight(1).map { 
  c => struct(lit(c).alias("k"), col(c).alias("v")) 
}: _*))

val countsTransposedDF = countsDF
  .withColumn("kv", kv)
  .select($"pivot", $"kv.k", $"kv.v")
  .groupBy($"k")
  .pivot("pivot")
  .agg(first($"v"))
  .withColumnRenamed("k", "Column Name")

countsTransposedDF.show(100, false)

transposed

Upvotes: 0

Avseiytsev Dmitriy
Avseiytsev Dmitriy

Reputation: 1160

If you don't want to drop empty rows/columns and you don't need to do any additional calculations in you job, this should work for your:

 df.select(df.columns.map(colName => {
    count(when(col(colName).isNull, true)) as s"${colName}_nulls_count"
  }): _*)
  .show(10) // or save result somewhere

Upvotes: 8

Related Questions