Reputation: 237
I have table name "data" which having 5 columns and each column contain some null values. i want to take a count of each column's null value how can i write code for that result! its easy to take count of one column but how can i write code for counting each column of table.
sample :
+----------------+----------------+--------+---------+-------------+
| 2 |3 |4 | 5 |6 |
+----------------+----------------+--------+---------+-------------+
|null |1 | null |null |null |
|null |null | null |null |asdc |
|null |23 | 23 |null |null |
|null |null | null |23 |41 |
|24 |3 | 35 |null |null |
|null |null | null | 1 |wef |
|null |32 | 54 |null |45 |
|null |null | null |123 |null |
|w411 |31 | 12 |null |null |
|null |null | null |11 |null |
+----------------+----------------+--------+---------+-------------+
how take null count of each column
I have 40 tables which contain 5 or 6 or 10 columns and each column contain some null values i just want to take null count of each column of tables which is the best way to take null count!
Thanks in advance!
Upvotes: 1
Views: 2378
Reputation: 9425
okay, but what if i have 100 of columns then how can i take null count? – Shree Batale
val myTableDF = Seq(
(1, 100, 0, 0, 0, 0, 0),
(2, 0, 50, 0, 0, 20, 0),
(3, 0, 0, 0, 0, 0, 0),
(4, 0, 0, 0, 0, 0, 0)
).toDF("column1", "column2", "column3", "column4", "column5", "column6", "column7")
val inputDF = myTableDF
println("Total " + inputDF.count() + " rows in the input DataFrame\n")
val countsDF = inputDF.select(inputDF.columns.map(c => count(when(col(c).isNull or col(c) === 0, c)).alias(c)): _*)
.withColumn("pivot", lit("Nulls and 0s count"))
.cache()
val kv = explode(array(countsDF.columns.dropRight(1).map {
c => struct(lit(c).alias("k"), col(c).alias("v"))
}: _*))
val countsTransposedDF = countsDF
.withColumn("kv", kv)
.select($"pivot", $"kv.k", $"kv.v")
.groupBy($"k")
.pivot("pivot")
.agg(first($"v"))
.withColumnRenamed("k", "Column Name")
countsTransposedDF.show(100, false)
Upvotes: 0
Reputation: 1160
If you don't want to drop empty rows/columns and you don't need to do any additional calculations in you job, this should work for your:
df.select(df.columns.map(colName => {
count(when(col(colName).isNull, true)) as s"${colName}_nulls_count"
}): _*)
.show(10) // or save result somewhere
Upvotes: 8