Reputation: 43
Currently using Postgres 9.1. I'm looking for a way to get the count for every column in a specific table. It has around 600 columns and looks like this:
CREATE TABLE geoproject.mes_wastab (
invid character varying(16)
, invtype character varying(3)
, smpid integer
, smpname character varying(40)
, smpdate date
, smptime character varying(5)
, o2st_p double precision
, wspgokna double precision
, o2_p double precision
, absenkungs double precision
, klarstrom3 double precision
, ...
);
Column names should be queried from information_schema.columns
. Not-null values should be counted. The result should look like this:
column_name : count
Can I query this with SQL, or do I need a function?
Upvotes: 2
Views: 4715
Reputation: 196
I wanted a dynamic solution for this problem, so, using the query proposed by @Erwin, here is a function that only needs the table and schema names and outputs a table like:
columns, percentage
------------------
colname1, perc1
colname2, perc2
...
colnamen, percn
CREATE OR REPLACE FUNCTION public.completeness_histogram(_tabella text, _schema text)
RETURNS TABLE(columns text, percentage numeric)
LANGUAGE plpgsql
AS $function$
declare
seed_query text;
col_list text;
intermediate_query text;
final_query text;
begin
SELECT 'SELECT ' || string_agg(concat('round(100 * count(', col
, ') / count(*)::numeric, 2) AS ', col_pct), E'\n , ')
|| E'\nFROM ' || tbl into seed_query
FROM (
SELECT quote_ident(table_schema) || '.' || quote_ident(table_name) AS tbl
, quote_ident(column_name) AS col
, quote_ident(column_name) AS col_pct
FROM information_schema.columns
WHERE table_name = _tabella
and table_schema = _schema
ORDER BY ordinal_position
) sub
GROUP BY tbl;
select string_agg(col_pct, ', ') into col_list
from (
SELECT quote_ident(column_name) AS col_pct
FROM information_schema.columns
WHERE table_name = _tabella
and table_schema = _schema
ORDER BY ordinal_position
) foo;
intermediate_query := format('SELECT ''SELECT * FROM unnest(
''''{%s}''''::text[]
, '' || string_agg(quote_literal(ARRAY[%s])
|| ''::numeric[]'', E''\n, '')
|| E'') \n AS t(col, completezza)'' AS sql
FROM (
%s
) foo;', col_list, col_list, seed_query);
execute format(intermediate_query) into final_query;
return query execute format(final_query);
end;
$function$
;
Upvotes: 1
Reputation: 657912
This query will create the DML statement to get what you want.
SELECT 'SELECT ' || string_agg('count(' || quote_ident(attname) || ')', ', ')
|| 'FROM ' || attrelid::regclass
FROM pg_attribute
WHERE attrelid = 'mytbl'::regclass
AND attnum >= 1 -- exclude tableoid & friends (neg. attnum)
AND attisdropped is FALSE -- exclude deleted columns
GROUP BY attrelid;
Returns:
SELECT count(col1), count(col2), count(col3), ...
FROM mytbl
You can automatically execute it, too. But not in plan SQL, you need EXECUTE
in a plpgsql function or DO
statement (PostgreSQL 9.0 or later) for that.
You also need Postgres 9.0 or later for the string_agg()
function. In older versions, you can substitute: array_to_string(array_agg(...), ', ')
.
You may wonder about the special cast 'mytbl'::regclass
. Read more about object identifier types in the manual.
BTW: NULL
values do not add to COUNT(col)
by default.
Substitute the (schema-qualified) table name for mytbl
. In your case that should be:
...
WHERE attrelid = 'geoproject.mes_wastab'::regclass
...
If you should be using mixed case or otherwise messed up identifiers (note the quotes):
...
WHERE attrelid = '"gEopRoject"."MES_wastab"'::regclass
...
Upvotes: 4