Rick Pack
Rick Pack

Reputation: 1074

Snowflake: Count number of NULL values in each column SQL

What is the Snowflake SQL version of this SQL Server solution that counts NULL values in each column of a table? The SQL executes via the SQL Server EXEC function. I would prefer to not use a stored procedure because of access restrictions but such a solution would be good to have for the future.

I would prefer the solution display the column name in one column and the number of NULL values in another a la:

COLUMN_NAME NULLVALUES

HATS        325
SHOES       0
DOGS        9998

The dynamic SQL (SQL built at runtime) below is the best I have been able to do so far. The name of the column containing the column names will unfortunately be the first column in the table. Ideally, the SQL would execute automatically but experiments with BEGIN and END as well as EXECUTE IMMEDIATE did not work.

-- SUMMARY: Count NULL values for every column of the specified table.
-- This SQL dynamically builds a query that counts the NULL values in every column of a table.
-- One copies and pastes the code in the resulting SQLROW column (can double-click to open query window then copy to clipboard)

USE DATABASE YOUR_DATABASE

-- What database contains the table of interest?
SET DBName         = 'YOUR_DATABASE';
-- What is the schema of the table?
SET SchemaName     = 'YOUR_SCHEMA';
--What is the table name?
SET TableName = 'YOUR_TABLE';

SELECT ($DBName || '.' || $SchemaName || '.' || $TableName) as FullTablePath;

WITH SQLText AS (
    SELECT 
        ROW_NUMBER() OVER (ORDER BY c.column_name) AS RowNum,
        'SELECT ' || '''' || c.column_name || '''' ||
         ', SUM(CASE WHEN ' || c.column_name || ' IS NULL THEN 1 ELSE 0 END) AS NullValues 
         FROM ' || $DBName || '.' || $SchemaName || '.' || $TableName AS SQLRow
    FROM 
        information_schema.tables t 
        INNER JOIN information_schema.columns c 
        ON c.table_name = t.table_name and c.table_schema = t.table_schema
    WHERE 
        t.table_name = $TableName and t.table_schema = $SchemaName),
Recur AS (
    SELECT
        RowNum,
        TO_VARCHAR(SQLRow) AS SQLRow
    FROM
        SQLText
    WHERE
        RowNum = 1
    UNION ALL
    SELECT
        t.RowNum,
        r.SQLRow || ' UNION ALL ' || t.SQLRow
    FROM
        SQLText t
        INNER JOIN Recur r ON t.RowNum = r.RowNum + 1
    ),
no_dupes as (
   select * from Recur where RowNum =  (SELECT MAX(RowNum) FROM Recur)
  )

select SQLRow from no_dupes 

Upvotes: 2

Views: 2055

Answers (1)

Simeon Pilgrim
Simeon Pilgrim

Reputation: 25958

some setup:

create table test.test.some_nulls(a string, b int, c boolean) as
    select * from values 
        ('a', 1, true),
        ('b', 2, false),
        ('c', 3, null),
        ('d', null, true),
        ('e', null, false),
        ('f', null, null),
        
        (null, 7, true),
        ('h', 8, false),
        (null, 9, null),
        ('j', null, true),
        (null, null, false),
        ('l', null, null);

and:

SET DBName = 'TEST';
SET SchemaName     = 'TEST';
SET TableName = upper('some_nulls'); 

this is the sql we are wanting to have run in the end:

select count(*)-count(a) as a, 
  count(*)-count(b) as b,
  count(*)-count(c) as c 
from TEST.TEST.some_nulls;
A B C
3 6 4

and this SQL builds that SQL, via LISTAGG:

select 'select ' 
    || listagg('count(*)-count('||column_name||') as ' || column_name, ', ')  WITHIN GROUP (order by ordinal_position )
    || ' from '|| $DBName ||'.'|| $SchemaName ||'.'|| $TableName as sql
from information_schema.columns
where table_name = $TableName;
SQL
select count(*)-count(A) as A, count(*)-count(B) as B, count(*)-count(C) as C from TEST.TEST.SOME_NULLS

thus this SQL builds, and runs the second SQL and then runs the results which is the first SQL, via RESULTSETs, CURSORS, Snowflake Scripting

declare
  res resultset;
  select_statement text;
begin

  select_statement := 'select ''select '' '||
    '   || listagg(''count(*)-count(''||column_name||'') as '' || column_name, '', '')  WITHIN GROUP (order by ordinal_position ) ' ||
    '   || '' from '|| $DBName ||'.'|| $SchemaName ||'.'|| $TableName || ''' as sql' ||
    ' from '||$DBName||'.information_schema.columns '||
    ' where table_name = ''' || $TableName || '''';
      
  res := (execute immediate :select_statement);
  let cur1 cursor for res;
  
  for record in cur1 do
    res := (execute immediate record.sql);
    break;
  end for;

  return table(res);
end;

and gives:

A B C
3 6 4

Upvotes: 2

Related Questions