Snowflake count nulls in all columns

Question

I've seen a few questions like this - Count NULL Values from multiple columns with SQL

But is there really not a way to count nulls in a table with say, over 30 columns? Like I don't want to specify them all by name?

Lukasz Szozda · Accepted Answer

But is there really not a way to count nulls in a table with say, over 30 columns? Like I don't want to specify them all by name?

yes exactly that. I don't understand why it's so difficult - it's like 1 line in pandas?

Keypoint here is if something is not provided as "batteries included" then you need to write your own version. It is not so hard as it may look.

Let's say the input table is as follow:

CREATE OR REPLACE TABLE t AS SELECT $1 AS col1, $2 AS col2, $3 AS col3, $4 AS col4 
FROM VALUES (1,2,3,10),(NULL,2,3,10),(NULL,NULL,4,10),(NULL,NULL,NULL,10);

SELECT * FROM t;
/*
+------+------+------+------+
| COL1 | COL2 | COL3 | COL4 |
+------+------+------+------+
| 1    | 2    | 3    |   10 |
| NULL | 2    | 3    |   10 |
| NULL | NULL | 4    |   10 |
| NULL | NULL | NULL |   10 |
+------+------+------+------+
*/

You probably know how to write the query that gives the desired output, but as it was not provided in the question I will use my own version:

WITH cte AS (
  SELECT
      COUNT(*) AS total_rows
      ,total_rows - COUNT(col1) AS col1
      ,total_rows - COUNT(col2) AS col2
      ,total_rows - COUNT(col3) AS col3
      ,total_rows - COUNT(col4) AS col4
  FROM t
)
SELECT COLUMN_NAME, NULLS_COLUMN_COUNT,SUM(NULLS_COLUMN_COUNT) OVER() AS NULLS_TOTAL_COUNT
FROM cte
UNPIVOT (NULLS_COLUMN_COUNT FOR COLUMN_NAME IN (col1,col2,col3, col4))
ORDER BY COLUMN_NAME;
/*
+-------------+--------------------+-------------------+
| COLUMN_NAME | NULLS_COLUMN_COUNT | NULLS_TOTAL_COUNT |
+-------------+--------------------+-------------------+
| COL1        |                  3 |                 6 |
| COL2        |                  2 |                 6 |
| COL3        |                  1 |                 6 |
| COL4        |                  0 |                 6 |
+-------------+--------------------+-------------------+
*/

Here we could see that the query is "static" in nature with few moving parts(column_count_list/table_name/column_list):

WITH cte AS (
  SELECT
      COUNT(*) AS total_rows
      
  FROM 
)
SELECT COLUMN_NAME, NULLS_COLUMN_COUNT,SUM(NULLS_COLUMN_COUNT) OVER() AS NULLS_TOTAL_COUNT
FROM cte
UNPIVOT (NULLS_COLUMN_COUNT FOR COLUMN_NAME IN ())
ORDER BY COLUMN_NAME;

Now using the metadata and variables:

-- input
SET sch_name = 'my_schema';
SET tab_name = 't';

SELECT
     LISTAGG(c.COLUMN_NAME, ', ') WITHIN GROUP(ORDER BY c.COLUMN_NAME) AS column_list
    ,ANY_VALUE(c.TABLE_SCHEMA || '.' || c.TABLE_NAME) AS full_table_name
    ,LISTAGG(REPLACE(SPACE(6) || ',total_rows - COUNT() AS ' 
                              || CHAR(13)
             , '', c.COLUMN_NAME), '') 
     WITHIN GROUP(ORDER BY COLUMN_NAME) AS column_count_list

    ,REPLACE(REPLACE(REPLACE(
'WITH cte AS (
  SELECT
      COUNT(*) AS total_rows

  FROM 
)
SELECT COLUMN_NAME, NULLS_COLUMN_COUNT,SUM(NULLS_COLUMN_COUNT) OVER() AS NULLS_TOTAL_COUNT
FROM cte
UNPIVOT (NULLS_COLUMN_COUNT FOR COLUMN_NAME IN ())
ORDER BY COLUMN_NAME;'
    ,'',     column_count_list)
    ,'',            full_table_name)
    ,'',           column_list) AS query_to_run

FROM INFORMATION_SCHEMA.COLUMNS c
WHERE TABLE_SCHEMA = UPPER($sch_name)
  AND TABLE_NAME = UPPER($tab_name);

Running the code will generate the query to be run:

Copying the output and running it will give the output. This template could be further refined and wrapped with stored procedure if needed(but I will left it as an exercise).

Snowflake count nulls in all columns

Answers (2)

Related Questions