Reputation: 1074
What is the Snowflake SQL version of this SQL Server solution that counts NULL values in each column of a table? The SQL executes via the SQL Server EXEC
function. I would prefer to not use a stored procedure because of access restrictions but such a solution would be good to have for the future.
I would prefer the solution display the column name in one column and the number of NULL values in another a la:
COLUMN_NAME NULLVALUES
HATS 325
SHOES 0
DOGS 9998
The dynamic SQL (SQL built at runtime) below is the best I have been able to do so far. The name of the column containing the column names will unfortunately be the first column in the table. Ideally, the SQL would execute automatically but experiments with BEGIN and END as well as EXECUTE IMMEDIATE did not work.
-- SUMMARY: Count NULL values for every column of the specified table.
-- This SQL dynamically builds a query that counts the NULL values in every column of a table.
-- One copies and pastes the code in the resulting SQLROW column (can double-click to open query window then copy to clipboard)
USE DATABASE YOUR_DATABASE
-- What database contains the table of interest?
SET DBName = 'YOUR_DATABASE';
-- What is the schema of the table?
SET SchemaName = 'YOUR_SCHEMA';
--What is the table name?
SET TableName = 'YOUR_TABLE';
SELECT ($DBName || '.' || $SchemaName || '.' || $TableName) as FullTablePath;
WITH SQLText AS (
SELECT
ROW_NUMBER() OVER (ORDER BY c.column_name) AS RowNum,
'SELECT ' || '''' || c.column_name || '''' ||
', SUM(CASE WHEN ' || c.column_name || ' IS NULL THEN 1 ELSE 0 END) AS NullValues
FROM ' || $DBName || '.' || $SchemaName || '.' || $TableName AS SQLRow
FROM
information_schema.tables t
INNER JOIN information_schema.columns c
ON c.table_name = t.table_name and c.table_schema = t.table_schema
WHERE
t.table_name = $TableName and t.table_schema = $SchemaName),
Recur AS (
SELECT
RowNum,
TO_VARCHAR(SQLRow) AS SQLRow
FROM
SQLText
WHERE
RowNum = 1
UNION ALL
SELECT
t.RowNum,
r.SQLRow || ' UNION ALL ' || t.SQLRow
FROM
SQLText t
INNER JOIN Recur r ON t.RowNum = r.RowNum + 1
),
no_dupes as (
select * from Recur where RowNum = (SELECT MAX(RowNum) FROM Recur)
)
select SQLRow from no_dupes
Upvotes: 2
Views: 2055
Reputation: 25958
some setup:
create table test.test.some_nulls(a string, b int, c boolean) as
select * from values
('a', 1, true),
('b', 2, false),
('c', 3, null),
('d', null, true),
('e', null, false),
('f', null, null),
(null, 7, true),
('h', 8, false),
(null, 9, null),
('j', null, true),
(null, null, false),
('l', null, null);
and:
SET DBName = 'TEST';
SET SchemaName = 'TEST';
SET TableName = upper('some_nulls');
this is the sql we are wanting to have run in the end:
select count(*)-count(a) as a,
count(*)-count(b) as b,
count(*)-count(c) as c
from TEST.TEST.some_nulls;
A | B | C |
---|---|---|
3 | 6 | 4 |
and this SQL builds that SQL, via LISTAGG:
select 'select '
|| listagg('count(*)-count('||column_name||') as ' || column_name, ', ') WITHIN GROUP (order by ordinal_position )
|| ' from '|| $DBName ||'.'|| $SchemaName ||'.'|| $TableName as sql
from information_schema.columns
where table_name = $TableName;
SQL |
---|
select count(*)-count(A) as A, count(*)-count(B) as B, count(*)-count(C) as C from TEST.TEST.SOME_NULLS |
thus this SQL builds, and runs the second SQL and then runs the results which is the first SQL, via RESULTSETs, CURSORS, Snowflake Scripting
declare
res resultset;
select_statement text;
begin
select_statement := 'select ''select '' '||
' || listagg(''count(*)-count(''||column_name||'') as '' || column_name, '', '') WITHIN GROUP (order by ordinal_position ) ' ||
' || '' from '|| $DBName ||'.'|| $SchemaName ||'.'|| $TableName || ''' as sql' ||
' from '||$DBName||'.information_schema.columns '||
' where table_name = ''' || $TableName || '''';
res := (execute immediate :select_statement);
let cur1 cursor for res;
for record in cur1 do
res := (execute immediate record.sql);
break;
end for;
return table(res);
end;
and gives:
A | B | C |
---|---|---|
3 | 6 | 4 |
Upvotes: 2