ShayHa
ShayHa

Reputation: 422

Postgres get all columns that have no null values

So I have a table with a lot of columns and rows and I need to filter it so columns with null values get droped.

In python I can do that this way(small example) :


df = pd.DataFrame({'col1': [1, None,3], 'col2': [3, 4,5],'col3': [3, 4,None]})

df.dropna(axis=1,how='any')

But I just have no clue how to do it in postgres.

I could not find an answer to that problem. I did find some answers but they use the column names which I just can't use because I have too many.

Thanks!

Upvotes: 2

Views: 4338

Answers (3)

Miles Elam
Miles Elam

Reputation: 1778

I can't claim it will break any speed records, but it will do what you ask. No dynamic SQL or user-defined functions necessary.

SELECT t.*
FROM your_table as t
-- If nulls are present, these will not be equal
WHERE to_jsonb(t) = jsonb_strip_nulls(to_jsonb(t))

If performance becomes a real concern such as having to run this query many times, you could create an expression index for it. However, I would recommend normalizing your database's data model if that's the case. You may just be papering over structural defects.

CREATE INDEX nulls_detected
  ON your_table (to_jsonb(your_table) = jsonb_strip_nulls(to_jsonb(your_table)));

Further optimizations could probably be found using a bloom filter for your index.


Here's an example of this in action:

CREATE TABLE null_example (
  id serial PRIMARY KEY,
  col1 int,
  col2 text,
  col3 boolean
);

INSERT INTO null_example (col1, col2, col3) VALUES
(1, 'test1', true),
(NULL, 'test2', false),
(3, NULL, true),
(4, 'test4', NULL),
(5, 'test5', false);

Now if you run the following…

SELECT t.*
FROM null_example AS t
WHERE to_jsonb(t) = jsonb_strip_nulls(to_jsonb(t));

…you get the following output. Any rows that contain NULL column values have been omitted.

id | col1 | col2  | col3
---+------+-------+------
1  | 1    | test1 | t
5  | 5    | test5 | f

If you are trying to target columns for removal such as from an ALTER TABLE … DROP COLUMN statement, the following query can help you along the way as well.

SELECT results.key, count(*), array_agg(t.id) AS affected_ids
FROM null_example AS t
  CROSS JOIN LATERAL jsonb_each(to_jsonb(t)) AS results(key, value)
WHERE results.value = 'null'::jsonb
GROUP BY results.key

This returns:

 key | count | affected_ids
-----+-------+--------------
col2 | 1     | {3}
col3 | 1     | {4}
col1 | 1     | {2}

Upvotes: 2

richyen
richyen

Reputation: 10018

There isn't a facility to do this in Postgres, so you'll have to build-your-own function. One way to approach this is to retrieve all the column names with:

select attname
  from pg_attribute
 where attrelid = (select oid from pg_class where relname = 'foo')
   and attnum > 0;

Then, loop through the attnames and perform:

select count(*)
  from foo
 where <attname> is null;

A possible function may look like;

postgres=# create table foo (col1 int, col2 int, col3 int);
CREATE TABLE
postgres=# insert into foo values (1, null, null);
INSERT 0 1
postgres=# insert into foo values (1, 1, null);   
INSERT 0 1
postgres=# insert into foo values (1, null, 1);
INSERT 0 1
postgres=# CREATE OR REPLACE function find_null_cols() RETURNS setof record
AS
$$
declare
rr record;
r record;
c int;
begin
for r in (select attname
  from pg_attribute
 where attrelid = (select oid from pg_class where relname = 'foo')
   and attnum > 0)
loop
  execute format ('select count(*) from foo where %s is null', r.attname) into c;
  if c > 0 then
    select r.attname::text, c into rr;
    return next rr;
  end if;

end loop;
return;
end
$$
LANGUAGE plpgsql;
CREATE FUNCTION
postgres=# select * from find_null_cols() as x(colname text, nullcount_rows int);
 colname | nullcount_rows 
---------+----------------
 col2    |              2
 col3    |              2
(2 rows)

I won't add the DROP COLUMN commands for your safety ;)

Upvotes: 1

Arcadio
Arcadio

Reputation: 126

dynamic sql in postgresql

You can use 'execute statement'. Get the names of the columns with

SELECT *
   FROM information_schema.columns
  WHERE table_schema = 'your_schema'
    AND table_name = 'your_table'
      ;

and for each one of them, make a statement 'delete from' your view if the column has any null value

DECLARE
    column_name text;
BEGIN
    FOR column_name IN SELECT FROM information_schema.columns WHERE table_schema = 'your_schema' AND table_name = 'your_table' LOOP

        ...
        EXECUTE 'delete from ... where(SELECT ....' || column_name || '.....)';
        ...

    END LOOP;
END;

Upvotes: 0

Related Questions