iconoclast
iconoclast

Reputation: 22610

How to find all rows with a NULL value in any column using PostgreSQL

There are many slightly similar questions, but none solve precisely this problem. "Find All Rows With Null Value(s) in Any Column" is the closest one I could find and offers an answer for SQL Server, but I'm looking for a way to do this in PostgreSQL.

How can I select only the rows that have NULL values in any column?

I can get all the column names easily enough:

select column_name from information_schema.columns where table_name = 'A';

but it's unclear how to check multiple column names for NULL values. Obviously this won't work:

select* from A where (
  select column_name from information_schema.columns where table_name = 'A';
) IS NULL;

And searching has not turned up anything useful.

Upvotes: 41

Views: 50451

Answers (3)

Marth
Marth

Reputation: 24802

You can use NOT(<table> IS NOT NULL).

From the documentation :

If the expression is row-valued, then IS NULL is true when the row expression itself is null or when all the row's fields are null, while IS NOT NULL is true when the row expression itself is non-null and all the row's fields are non-null.

So given table t,

SELECT * FROM t;
┌────────┬────────┐
│   f1   │   f2   │
├────────┼────────┤
│ (null) │      1 │
│      2 │ (null) │
│ (null) │ (null) │
│      3 │      4 │
└────────┴────────┘
(4 rows)

here are all the possible variants:

SELECT * FROM t             SELECT * FROM t           
WHERE (t IS NULL);          WHERE (t IS NOT NULL);    
┌────────┬────────┐         ┌────────┬────────┐       
│   f1   │   f2   │         │   f1   │   f2   │       
├────────┼────────┤         ├────────┼────────┤       
│ (null) │ (null) │         │      3 │      4 │       
└────────┴────────┘         └────────┴────────┘       
(1 row)                     (1 rows)                  
                                                      
                                                      

SELECT * FROM t             SELECT * FROM t           
WHERE NOT (t IS NULL);      WHERE NOT (t IS NOT NULL);
┌────────┬────────┐         ┌────────┬────────┐       
│   f1   │   f2   │         │   f1   │   f2   │       
├────────┼────────┤         ├────────┼────────┤       
│ (null) │      1 │         │ (null) │      1 │       
│      2 │ (null) │         │      2 │ (null) │       
│      3 │      4 │         │ (null) │ (null) │       
└────────┴────────┘         └────────┴────────┘       
(3 rows)                    (3 rows)                  

Upvotes: 71

V&#233;race
V&#233;race

Reputation: 908

An alternative solution which might be of use in some edge cases could be the following (all of the code below is available on the fiddle here):

Setup:

CREATE TABLE t
(
  f1 INT,
  f2 INT,
  f3 TEXT,
  f4 INT,
  f5 TEXT
  --
  -- arbitrary number of fields...
  --
);

Populate:

INSERT INTO t VALUES
(null, 1, 'blah', 5, 'xxx'),
(2, null, 'yyy', 6, 'wwww'),
(null, null, 'ggg', 99, 'wwww'), 
(3, 4, 'yyy', 17, 'yuoyo');  -- no NULLs

Row SQL:

First, we run:

SELECT ROW(t) FROM t;

Result:

row
("(,1,blah,5,xxx)")
("(2,,yyy,6,wwww)")
("(,,ggg,99,wwww)")
("(3,4,yyy,17,yuoyo)")

Hmmm... a bit messy, so now we run:

SELECT ROW(t.*) FROM t;

Result:

row
(,1,blah,5,xxx)
(2,,yyy,6,wwww)
(,,ggg,99,wwww)
(3,4,yyy,17,yuoyo)

OK, this is bit tidier - so, now , the solution that I'm proposing. Please bear in mind that I'm not suggesting that it is better than the accepted answer (see performance analysis below). What I am suggesting is that my solution might be of use in some tricky edge cases.

So, we run:

WITH cte AS
(
  SELECT ROW(t.*)::TEXT FROM t
)
SELECT * FROM cte
WHERE row ~ '^\(,|,,|,\)';

Explanation of the regex:

  • ^ - the caret character is an anchor - it means the start of the string.

  • \ - is the escape character.

  • ( - the character being escaped. It's a metacharacter (i.e. special meaning in regexes) - normally for capturing groups, but being escaped, it's now a literal bracket.

  • , - a literal comma.

  • | - the pipe character - alternation in regexes - i.e. OR - taken altogether, ^\(,|,, means either (, OR ,,.

  • |,\) - wrapping it up, the final OR and then matching ,).

So, the regex matches either (, OR ,, OR ,). Basically, this pulls out any row where there is a NULL either at the beginning of the row string, OR in the middle OR at the end of the string.

  • ** a word of warning, this regex could fail if you have "weird" text strings with random characters - it is conceivable that you may get one of the combinations above - caveat emptor!

Result:

row
(,1,blah,5,xxx)  -- matches (,
(2,,yyy,6,wwww)  -- matches ,,
(,,ggg,99,wwww)  -- matches both (, and ,,
--
-- Note that the row with no NULLs has been excluded, as desired
--

Performance analysis:

Queries should always be checked for performance, so we run:

EXPLAIN (ANALYZE, VERBOSE, BUFFERS, SETTINGS)
WITH cte AS
(
  SELECT ROW(t.*)::TEXT FROM t
)
SELECT * FROM cte
WHERE row ~ '^\(,|,,|,\)';

Result (details snipped, see fiddle):

Planning Time: 0.038 ms
Execution Time: 0.062 ms

and the same for SELECT * FROM t WHERE NOT (t IS NOT NULL);:

Planning Time: 0.039 ms
Execution Time: 0.018 ms

So, we can see that the accepted answer takes ~ 30% of the time that this regex solution does. Regexes are very powerful, but with greater power comes greater complexity - apply regexes with caution! A great site to learn about them is to be found here.

Upvotes: 1

Imaduddin
Imaduddin

Reputation: 55

After getting the list of column and table, you can use LOOP and EXECUTE to run them one by one.

This wouldn't give you the answer in table form, but at least you can eyeball it via the RAISE NOTICEs, as for my use case, it's only part of a maintenance/patching work.

DO $$ 
DECLARE 
    id_columns TEXT;
    id_column_tables TEXT;
    null_count INTEGER;
BEGIN
    FOR id_columns, id_column_tables IN (
        SELECT column_name, table_name
        FROM information_schema.columns 
        WHERE table_name = 'A' 
    )
    LOOP
        EXECUTE FORMAT(
            'SELECT COUNT(*) FROM %I WHERE %I is null',
            id_column_tables,
            id_columns
        ) INTO null_count;
        
        IF null_count > 0 then
            RAISE NOTICE 'Column: % in TABLE: % - Null Count: %', id_columns, id_column_tables, null_count;
        END IF;
    END LOOP;
END $$;

Upvotes: 0

Related Questions