user1288241
user1288241

Reputation: 43

Count values for every column in a table

Currently using Postgres 9.1. I'm looking for a way to get the count for every column in a specific table. It has around 600 columns and looks like this:

CREATE TABLE geoproject.mes_wastab (
  invid character varying(16)
, invtype character varying(3)
, smpid integer
, smpname character varying(40)
, smpdate date
, smptime character varying(5)
, o2st_p double precision
, wspgokna double precision
, o2_p double precision
, absenkungs double precision
, klarstrom3 double precision
, ...
);

Column names should be queried from information_schema.columns. Not-null values should be counted. The result should look like this:

column_name : count

Can I query this with SQL, or do I need a function?

Upvotes: 2

Views: 4715

Answers (2)

Sotis
Sotis

Reputation: 196

I wanted a dynamic solution for this problem, so, using the query proposed by @Erwin, here is a function that only needs the table and schema names and outputs a table like:

columns, percentage
------------------
colname1, perc1
colname2, perc2
...
colnamen, percn
CREATE OR REPLACE FUNCTION public.completeness_histogram(_tabella text, _schema text)
 RETURNS TABLE(columns text, percentage numeric)
 LANGUAGE plpgsql
AS $function$
declare 
    seed_query text;
    col_list text;
    intermediate_query text;
    final_query text;
begin

    SELECT 'SELECT ' || string_agg(concat('round(100 * count(', col
                      , ') / count(*)::numeric, 2) AS ', col_pct), E'\n     , ')
        || E'\nFROM   ' ||  tbl into seed_query
    FROM (
       SELECT quote_ident(table_schema) || '.' || quote_ident(table_name) AS tbl
            , quote_ident(column_name) AS col
            , quote_ident(column_name) AS col_pct
       FROM   information_schema.columns
       WHERE  table_name = _tabella
       and    table_schema = _schema
       ORDER  BY ordinal_position
       ) sub
    GROUP  BY tbl;

    select string_agg(col_pct, ', ') into col_list
    from (
       SELECT quote_ident(column_name) AS col_pct
       FROM   information_schema.columns
       WHERE  table_name = _tabella
       and table_schema = _schema
       ORDER  BY ordinal_position
    ) foo;

    
    intermediate_query := format('SELECT ''SELECT * FROM unnest(
      ''''{%s}''''::text[]
    , '' || string_agg(quote_literal(ARRAY[%s])
                  || ''::numeric[]'', E''\n, '')
        || E'') \n AS t(col, completezza)'' AS sql
    FROM   (
        %s
    ) foo;', col_list, col_list, seed_query);

    execute format(intermediate_query)  into final_query;

    return query execute format(final_query);
    
end;
$function$
;

Upvotes: 1

Erwin Brandstetter
Erwin Brandstetter

Reputation: 657912

This query will create the DML statement to get what you want.

SELECT 'SELECT ' || string_agg('count(' || quote_ident(attname) || ')', ', ')
    || 'FROM '   || attrelid::regclass
FROM   pg_attribute
WHERE  attrelid = 'mytbl'::regclass
AND    attnum  >= 1           -- exclude tableoid & friends (neg. attnum)
AND    attisdropped is FALSE  -- exclude deleted columns
GROUP  BY attrelid;

Returns:

SELECT count(col1), count(col2), count(col3), ...
FROM   mytbl

You can automatically execute it, too. But not in plan SQL, you need EXECUTE in a plpgsql function or DO statement (PostgreSQL 9.0 or later) for that.

You also need Postgres 9.0 or later for the string_agg() function. In older versions, you can substitute: array_to_string(array_agg(...), ', ').

You may wonder about the special cast 'mytbl'::regclass. Read more about object identifier types in the manual.

BTW: NULL values do not add to COUNT(col) by default.

Substitute the (schema-qualified) table name for mytbl. In your case that should be:

...
WHERE  attrelid = 'geoproject.mes_wastab'::regclass
...

If you should be using mixed case or otherwise messed up identifiers (note the quotes):

...
WHERE  attrelid = '"gEopRoject"."MES_wastab"'::regclass
...

Upvotes: 4

Related Questions