Reputation: 422
So I have a table with a lot of columns and rows and I need to filter it so columns with null values get droped.
In python I can do that this way(small example) :
df = pd.DataFrame({'col1': [1, None,3], 'col2': [3, 4,5],'col3': [3, 4,None]})
df.dropna(axis=1,how='any')
But I just have no clue how to do it in postgres.
I could not find an answer to that problem. I did find some answers but they use the column names which I just can't use because I have too many.
Thanks!
Upvotes: 2
Views: 4338
Reputation: 1778
I can't claim it will break any speed records, but it will do what you ask. No dynamic SQL or user-defined functions necessary.
SELECT t.*
FROM your_table as t
-- If nulls are present, these will not be equal
WHERE to_jsonb(t) = jsonb_strip_nulls(to_jsonb(t))
If performance becomes a real concern such as having to run this query many times, you could create an expression index for it. However, I would recommend normalizing your database's data model if that's the case. You may just be papering over structural defects.
CREATE INDEX nulls_detected
ON your_table (to_jsonb(your_table) = jsonb_strip_nulls(to_jsonb(your_table)));
Further optimizations could probably be found using a bloom filter for your index.
Here's an example of this in action:
CREATE TABLE null_example (
id serial PRIMARY KEY,
col1 int,
col2 text,
col3 boolean
);
INSERT INTO null_example (col1, col2, col3) VALUES
(1, 'test1', true),
(NULL, 'test2', false),
(3, NULL, true),
(4, 'test4', NULL),
(5, 'test5', false);
Now if you run the following…
SELECT t.*
FROM null_example AS t
WHERE to_jsonb(t) = jsonb_strip_nulls(to_jsonb(t));
…you get the following output. Any rows that contain NULL column values have been omitted.
id | col1 | col2 | col3
---+------+-------+------
1 | 1 | test1 | t
5 | 5 | test5 | f
If you are trying to target columns for removal such as from an ALTER TABLE … DROP COLUMN
statement, the following query can help you along the way as well.
SELECT results.key, count(*), array_agg(t.id) AS affected_ids
FROM null_example AS t
CROSS JOIN LATERAL jsonb_each(to_jsonb(t)) AS results(key, value)
WHERE results.value = 'null'::jsonb
GROUP BY results.key
This returns:
key | count | affected_ids
-----+-------+--------------
col2 | 1 | {3}
col3 | 1 | {4}
col1 | 1 | {2}
Upvotes: 2
Reputation: 10018
There isn't a facility to do this in Postgres, so you'll have to build-your-own function. One way to approach this is to retrieve all the column names with:
select attname
from pg_attribute
where attrelid = (select oid from pg_class where relname = 'foo')
and attnum > 0;
Then, loop through the attname
s and perform:
select count(*)
from foo
where <attname> is null;
A possible function may look like;
postgres=# create table foo (col1 int, col2 int, col3 int);
CREATE TABLE
postgres=# insert into foo values (1, null, null);
INSERT 0 1
postgres=# insert into foo values (1, 1, null);
INSERT 0 1
postgres=# insert into foo values (1, null, 1);
INSERT 0 1
postgres=# CREATE OR REPLACE function find_null_cols() RETURNS setof record
AS
$$
declare
rr record;
r record;
c int;
begin
for r in (select attname
from pg_attribute
where attrelid = (select oid from pg_class where relname = 'foo')
and attnum > 0)
loop
execute format ('select count(*) from foo where %s is null', r.attname) into c;
if c > 0 then
select r.attname::text, c into rr;
return next rr;
end if;
end loop;
return;
end
$$
LANGUAGE plpgsql;
CREATE FUNCTION
postgres=# select * from find_null_cols() as x(colname text, nullcount_rows int);
colname | nullcount_rows
---------+----------------
col2 | 2
col3 | 2
(2 rows)
I won't add the DROP COLUMN
commands for your safety ;)
Upvotes: 1
Reputation: 126
dynamic sql in postgresql
You can use 'execute statement'. Get the names of the columns with
SELECT *
FROM information_schema.columns
WHERE table_schema = 'your_schema'
AND table_name = 'your_table'
;
and for each one of them, make a statement 'delete from' your view if the column has any null value
DECLARE
column_name text;
BEGIN
FOR column_name IN SELECT FROM information_schema.columns WHERE table_schema = 'your_schema' AND table_name = 'your_table' LOOP
...
EXECUTE 'delete from ... where(SELECT ....' || column_name || '.....)';
...
END LOOP;
END;
Upvotes: 0