Reputation: 953
I'm trying to query a Netezza table to get count of rows where a column value is NULL -- for all columns. Concretely, assume we have the following table (titled, say, merchants
)
business_name phone email
-------------------------------------------------------
NULL 505-844-1234 john@example.com
Alibaba NULL mary@domain.com
NULL NULL harry@company.com
I want to generate an output table like:
column_name NULL_count
-------------------------------
business_name 2
phone 2
email 0
I can generate it for individual columns using:
select count(*)
from merchants
where <column_name> is null;
However, my table has over 100 columns and I don't want to hand code the queries. I'm aware that I can write a Java/Python code to query the table programmatically or even write a script to generate ~100 queries. However, the task does not seem overly complicated and I feel it should be achievable in pure SQL. The list of columns for every Netezza table is available via:
SELECT column_name
FROM information_schema.columns
WHERE LOWER(table_name) = 'merchants'`
I want to run the above count(*)
query for every column in the above list. I'm having difficulty figuring the proper join and/or using stored procedure. So far, I've tried modifying a Netezza stored procedure as explained here, but I keep getting syntax errors on the original code listed there.
TL;DR How can I generate null count for all columns in Netezza ?
Upvotes: 0
Views: 2635
Reputation: 1043
Code for table shown in question -
VARUNDBTEST.ADMIN(ADMIN)=> create table sample(business_name varchar(255), phone varchar(255), email varchar(255));
CREATE TABLE
VARUNDBTEST.ADMIN(ADMIN)=> insert into sample values(NULL,'505-844-1234','john@example.com');
INSERT 0 1
VARUNDBTEST.ADMIN(ADMIN)=> insert into sample values('Alibaba',NULL,'mary@domain.com');
INSERT 0 1
VARUNDBTEST.ADMIN(ADMIN)=> insert into sample values(null,NULL,'harry@company.com');
INSERT 0 1
VARUNDBTEST.ADMIN(ADMIN)=> select * from sample;
BUSINESS_NAME | PHONE | EMAIL
---------------+--------------+-------------------
Alibaba | | mary@domain.com
| 505-844-1234 | john@example.com
| | harry@company.com
(3 rows)
Code for Reference table -
Table "TBL1"
Attribute | Type | Modifier | Default Value
-------------+------------------------+----------+---------------
COLUMN_NAME | CHARACTER VARYING(255) | |
NULL_COUNT | CHARACTER VARYING(255) | |
Distributed on hash: "COLUMN_NAME"
Procedure Code (getNullCount.sql)-
CREATE OR REPLACE PROCEDURE getNullCount()
LANGUAGE NZPLSQL RETURNS REFTABLE(tbl1) AS
BEGIN_PROC
DECLARE
p_abc RECORD;
p_bcd RECORD;
BEGIN
FOR p_abc IN
SELECT column_name FROM information_schema.columns WHERE LOWER(table_name) = 'sample'
LOOP
FOR p_bcd IN
execute 'SELECT COUNT(*) as col_null_count FROM SAMPLE WHERE '|| p_abc.column_name||' is null'
LOOP
execute immediate 'INSERT INTO '|| REFTABLENAME||' VALUES('||quote_literal(p_abc.column_name)||','|| p_bcd.col_null_count||')';
END LOOP;
END LOOP;
return reftable;
END;
END_PROC;
Procedure creation and execution -
VARUNDBTEST.ADMIN(ADMIN)=> \i getNullCount.sql
CREATE PROCEDURE
VARUNDBTEST.ADMIN(ADMIN)=> call getnullcount();
COLUMN_NAME | NULL_COUNT
---------------+------------
BUSINESS_NAME | 2
EMAIL | 0
PHONE | 2
(3 rows)
Hope this will help.
Upvotes: 4