RDK
RDK

Reputation: 953

Generating NULL counts by looping over columns in Netezza

I'm trying to query a Netezza table to get count of rows where a column value is NULL -- for all columns. Concretely, assume we have the following table (titled, say, merchants)

business_name    phone                 email
-------------------------------------------------------
NULL             505-844-1234         john@example.com
Alibaba          NULL                 mary@domain.com
NULL             NULL                 harry@company.com

I want to generate an output table like:

column_name          NULL_count
-------------------------------
business_name          2
phone                  2
email                  0

I can generate it for individual columns using:

select count(*)
from merchants
where <column_name> is null;

However, my table has over 100 columns and I don't want to hand code the queries. I'm aware that I can write a Java/Python code to query the table programmatically or even write a script to generate ~100 queries. However, the task does not seem overly complicated and I feel it should be achievable in pure SQL. The list of columns for every Netezza table is available via:

SELECT column_name 
FROM information_schema.columns 
WHERE LOWER(table_name) = 'merchants'`

I want to run the above count(*) query for every column in the above list. I'm having difficulty figuring the proper join and/or using stored procedure. So far, I've tried modifying a Netezza stored procedure as explained here, but I keep getting syntax errors on the original code listed there.

TL;DR How can I generate null count for all columns in Netezza ?

Upvotes: 0

Views: 2635

Answers (1)

Varun Bajaj
Varun Bajaj

Reputation: 1043

Code for table shown in question -

VARUNDBTEST.ADMIN(ADMIN)=> create table sample(business_name varchar(255), phone varchar(255), email varchar(255));
CREATE TABLE
VARUNDBTEST.ADMIN(ADMIN)=> insert into sample values(NULL,'505-844-1234','john@example.com');
INSERT 0 1
VARUNDBTEST.ADMIN(ADMIN)=> insert into sample values('Alibaba',NULL,'mary@domain.com');
INSERT 0 1
VARUNDBTEST.ADMIN(ADMIN)=> insert into sample values(null,NULL,'harry@company.com');
INSERT 0 1
VARUNDBTEST.ADMIN(ADMIN)=> select * from sample;
 BUSINESS_NAME |    PHONE     |       EMAIL
---------------+--------------+-------------------
 Alibaba       |              | mary@domain.com
               | 505-844-1234 | john@example.com
               |              | harry@company.com
(3 rows)

Code for Reference table -

                   Table "TBL1"
  Attribute  |          Type          | Modifier | Default Value
-------------+------------------------+----------+---------------
 COLUMN_NAME | CHARACTER VARYING(255) |          |
 NULL_COUNT  | CHARACTER VARYING(255) |          |
Distributed on hash: "COLUMN_NAME"

Procedure Code (getNullCount.sql)-

CREATE OR REPLACE PROCEDURE getNullCount()
LANGUAGE NZPLSQL RETURNS REFTABLE(tbl1) AS

BEGIN_PROC

DECLARE
        p_abc     RECORD;
        p_bcd     RECORD;

BEGIN
        FOR p_abc IN
                SELECT column_name FROM information_schema.columns WHERE LOWER(table_name) = 'sample'
        LOOP
                FOR p_bcd IN
                        execute 'SELECT COUNT(*) as col_null_count FROM SAMPLE WHERE '|| p_abc.column_name||' is null'
                LOOP
                        execute immediate 'INSERT INTO '|| REFTABLENAME||' VALUES('||quote_literal(p_abc.column_name)||','|| p_bcd.col_null_count||')';
                END LOOP;

        END LOOP;

return reftable;

END;
END_PROC;

Procedure creation and execution -

VARUNDBTEST.ADMIN(ADMIN)=> \i getNullCount.sql
CREATE PROCEDURE


VARUNDBTEST.ADMIN(ADMIN)=> call getnullcount();
  COLUMN_NAME  | NULL_COUNT
---------------+------------
 BUSINESS_NAME | 2
 EMAIL         | 0
 PHONE         | 2
(3 rows)

Hope this will help.

Upvotes: 4

Related Questions