Christopher Turnbull
Christopher Turnbull

Reputation: 1003

Snowflake count nulls in all columns

I've seen a few questions like this - Count NULL Values from multiple columns with SQL

But is there really not a way to count nulls in a table with say, over 30 columns? Like I don't want to specify them all by name?

Upvotes: 8

Views: 10085

Answers (2)

Lukasz Szozda
Lukasz Szozda

Reputation: 176124

But is there really not a way to count nulls in a table with say, over 30 columns? Like I don't want to specify them all by name?

yes exactly that. I don't understand why it's so difficult - it's like 1 line in pandas?

Keypoint here is if something is not provided as "batteries included" then you need to write your own version. It is not so hard as it may look.

Let's say the input table is as follow:

CREATE OR REPLACE TABLE t AS SELECT $1 AS col1, $2 AS col2, $3 AS col3, $4 AS col4 
FROM VALUES (1,2,3,10),(NULL,2,3,10),(NULL,NULL,4,10),(NULL,NULL,NULL,10);

SELECT * FROM t;
/*
+------+------+------+------+
| COL1 | COL2 | COL3 | COL4 |
+------+------+------+------+
| 1    | 2    | 3    |   10 |
| NULL | 2    | 3    |   10 |
| NULL | NULL | 4    |   10 |
| NULL | NULL | NULL |   10 |
+------+------+------+------+
*/

You probably know how to write the query that gives the desired output, but as it was not provided in the question I will use my own version:

WITH cte AS (
  SELECT
      COUNT(*) AS total_rows
      ,total_rows - COUNT(col1) AS col1
      ,total_rows - COUNT(col2) AS col2
      ,total_rows - COUNT(col3) AS col3
      ,total_rows - COUNT(col4) AS col4
  FROM t
)
SELECT COLUMN_NAME, NULLS_COLUMN_COUNT,SUM(NULLS_COLUMN_COUNT) OVER() AS NULLS_TOTAL_COUNT
FROM cte
UNPIVOT (NULLS_COLUMN_COUNT FOR COLUMN_NAME IN (col1,col2,col3, col4))
ORDER BY COLUMN_NAME;
/*
+-------------+--------------------+-------------------+
| COLUMN_NAME | NULLS_COLUMN_COUNT | NULLS_TOTAL_COUNT |
+-------------+--------------------+-------------------+
| COL1        |                  3 |                 6 |
| COL2        |                  2 |                 6 |
| COL3        |                  1 |                 6 |
| COL4        |                  0 |                 6 |
+-------------+--------------------+-------------------+
*/

Here we could see that the query is "static" in nature with few moving parts(column_count_list/table_name/column_list):

WITH cte AS (
  SELECT
      COUNT(*) AS total_rows
      <column_count_list>
  FROM <table_name>
)
SELECT COLUMN_NAME, NULLS_COLUMN_COUNT,SUM(NULLS_COLUMN_COUNT) OVER() AS NULLS_TOTAL_COUNT
FROM cte
UNPIVOT (NULLS_COLUMN_COUNT FOR COLUMN_NAME IN (<column_list>))
ORDER BY COLUMN_NAME;

Now using the metadata and variables:

-- input
SET sch_name = 'my_schema';
SET tab_name = 't';

SELECT
     LISTAGG(c.COLUMN_NAME, ', ') WITHIN GROUP(ORDER BY c.COLUMN_NAME) AS column_list
    ,ANY_VALUE(c.TABLE_SCHEMA || '.' || c.TABLE_NAME) AS full_table_name
    ,LISTAGG(REPLACE(SPACE(6) || ',total_rows - COUNT(<col_name>) AS <col_name>' 
                              || CHAR(13)
             , '<col_name>', c.COLUMN_NAME), '') 
     WITHIN GROUP(ORDER BY COLUMN_NAME) AS column_count_list

    ,REPLACE(REPLACE(REPLACE(
'WITH cte AS (
  SELECT
      COUNT(*) AS total_rows
<column_count_list>
  FROM <table_name>
)
SELECT COLUMN_NAME, NULLS_COLUMN_COUNT,SUM(NULLS_COLUMN_COUNT) OVER() AS NULLS_TOTAL_COUNT
FROM cte
UNPIVOT (NULLS_COLUMN_COUNT FOR COLUMN_NAME IN (<column_list>))
ORDER BY COLUMN_NAME;'
    ,'<column_count_list>',     column_count_list)
    ,'<table_name>',            full_table_name)
    ,'<column_list>',           column_list) AS query_to_run

FROM INFORMATION_SCHEMA.COLUMNS c
WHERE TABLE_SCHEMA = UPPER($sch_name)
  AND TABLE_NAME = UPPER($tab_name);

Running the code will generate the query to be run:

enter image description here

Copying the output and running it will give the output. This template could be further refined and wrapped with stored procedure if needed(but I will left it as an exercise).

Upvotes: 10

Atul Verma
Atul Verma

Reputation: 1

@chris you should note that the metadata in Snowflake is similar to SQL Server. So anything you want to know at metadata level, would have already been solved by SQL Server practitioners. See this link - Count number of NULL values in each column in SQL This is different in Oracle where the metadata table gives the number of nulls in each column as well as density.

Upvotes: 0

Related Questions