Morris de Oryx
Morris de Oryx

Reputation: 2195

Starting from a column type, how to find supported aggregations in Postgres?

I'm trying to figure out from a column type, which aggregates the data type supports. There's a lot of variety amongst types, just a sample below (some of these support more aggregates, of course):

uuid     count()
text     count(), min(), max()
integer  count(), min, max(),avg(),sum()

I've been thrashing around in the system catalogs and views, but haven't found what I'm after. (See "thrashing around.") I've poked at pg_type, pg_aggregate, pg_operator, and a few more.

Is there a straightforward way to start from a column type and gather all supported aggregates?

For background, I'm writing a client-side cross-tab code generator, and the UX is better when the tool automatically prevents you from selecting an aggregation that's not supported. I've hacked in some hard-coded rules for now, but would like to improve the system.

We're on Postgres 11.4.

Upvotes: 2

Views: 385

Answers (4)

Morris de Oryx
Morris de Oryx

Reputation: 2195

I've been playing around with the suggestions a bit, and want to post one adaptation based on one of Erwin's scripts:

select type_id::regtype::text as type_name,
       array_agg(proname) as aggregate_names

from  (
       select proname, 
              unnest(proargtypes::regtype[])::text AS type_id
         from pg_proc
        where prokind = 'a'
     order by 2, 1
   ) subquery

  where type_id in ('"any"', 'bigint', 'boolean','citext','date','double precision','integer','interval','numeric','smallint',
                    'text','time with time zone','time without time zone','timestamp with time zone','timestamp without time zone')

group by type_id;

That brings back details on the types specified in the where clause. Not only is this useful for my current work, it's useful to my understanding generally. I've run into cases where I've had to recast something, like an integer to a double, to get it to work with an aggregate. So far, this has been pretty much trial and error. If you run the query above (or one like it), it's easier to see from the output where you need recasting between similar seeming types.

Upvotes: 0

sticky bit
sticky bit

Reputation: 37477

There's the pg_proc catalog table, that lists all functions. The column proisagg marks aggregation functions and the column proargtypes holds an array of the OIDs of the argument types.

So for example to get a list of all aggregation functions with the names of their arguments' type you could use:

SELECT pro.proname aggregationfunctionname,
       CASE
         WHEN array_agg(typ.typname ORDER BY proarg.position) = '{NULL}'::name[] THEN
           '{}'::name[]
         ELSE
            array_agg(typ.typname ORDER BY proarg.position)
       END aggregationfunctionargumenttypes
       FROM pg_proc pro
            CROSS JOIN LATERAL unnest(pro.proargtypes) WITH ORDINALITY proarg (oid,
                                                                               position) 
            LEFT JOIN pg_type typ
                      ON typ.oid = proarg.oid
       WHERE pro.proisagg
       GROUP BY pro.oid,
                pro.proname
       ORDER BY pro.proname;

Of course you may need to extend that, e.g. joining and respecting the schemas (pg_namespace) and check for compatible types in pg_type (have a look at the typcategory column for that), etc..


Edit:

I overlooked, that proisagg was removed in version 11 (I'm still mostly on a 9.6) as the other answers mentioned. So for the sake of completeness: As of version 11 replace WHERE pro.proisagg with WHERE pro.prokind = 'a'.

Upvotes: 1

Erwin Brandstetter
Erwin Brandstetter

Reputation: 658172

A plain list of available aggregate functions can be based on pg_proc like this:

SELECT oid::regprocedure::text AS agg_func_plus_args
FROM   pg_proc
WHERE  prokind = 'a'
ORDER  BY 1;

Or with separate function name and arguments:

SELECT proname AS agg_func, pg_get_function_identity_arguments(oid) AS args
FROM   pg_proc
WHERE  prokind = 'a'
ORDER  BY 1, 2;

pg_proc.prokind replaces proisagg in Postgres 11. In Postgres 10 or older use:

...
WHERE proisagg
...

Related:

To get a list of available functions for every data type (your question), start with:

SELECT type_id::regtype::text, array_agg(proname) AS agg_functions
FROM  (
   SELECT proname, unnest(proargtypes::regtype[])::text AS type_id
   FROM   pg_proc
   WHERE  proisagg
   ORDER  BY 2, 1
   ) sub
GROUP  BY type_id;

db<>fiddle here

Just a start. Some of the arguments are just "direct" (non-aggregated) (That's also why some functions are listed multiple times - due to those additional non-aggregate columns, example string_agg). And there are special cases for "ordered-set" and "hypothetical-set" aggregates. See the columns aggkind and aggnumdirectargs of the additional system catalog pg_aggregate. (You may want to exclude the exotic special cases for starters ...)

And many types have an implicit cast to one of the types listed by the query. Prominent example string_agg() works with varchar, too, but it's only listed for text above. You can extend the query with information from pg_cast to get the full picture.

Plus, some aggregates work for pseudo types "any", anyarray etc. You'll want to factor those in for every applicable data type.

The complication of multiple aliases for the same data type names can be eliminated easily, though: cast to regtype to get canonical names. Or use pg_typeof() which returns standard names. Related:

Upvotes: 2

Morris de Oryx
Morris de Oryx

Reputation: 2195

Man, that is just stunning Thank you. The heat death of the universe will arrive before I could have figured that out. I had to tweak one line for PG 11 compatibility...says the guy who did not say what version he was on. I've reworked the query to get close to what I'm after and included a bit of output for the archives.

with aggregates as (
SELECT pro.proname aggregate_name,
       CASE
         WHEN array_agg(typ.typname ORDER BY proarg.position) = '{NULL}'::name[] THEN
           '{}'::name[]
         ELSE
            array_agg(typ.typname ORDER BY proarg.position)
       END aggregate_types
       FROM pg_proc pro
            CROSS JOIN LATERAL unnest(pro.proargtypes) WITH ORDINALITY proarg (oid,
                                                                               position) 
            LEFT JOIN pg_type typ
                      ON typ.oid = proarg.oid
       WHERE pro. prokind = 'a' -- I needed this for PG 11, I didn't say what version I was using. 

       GROUP BY pro.oid,
                pro.proname

       ORDER BY pro.proname),

 -- The *super helpful* code above is _way_ past my skill level with Postgres. So, thrashing around a bit to get close to what I'm after.
 -- First up, a CTE to sort everything by aggregation and then combine the types.
aggregate_summary as (
 select aggregate_name,
        array_agg(aggregate_types) as types_array

   from aggregates

 group by 1
   order by 1)

-- Finally, the previous CTE is used to get the details and a count of the types.   
 select aggregate_name,
        cardinality(types_array) as types_count, -- Couldn't get array_length to work here. ¯\_(ツ)_/¯
        types_array

   from aggregate_summary

   limit 5;

And a bit of output:

aggregate_name   types_count    types_array
array_agg        2              {{anynonarray},{anyarray}}
avg              7              {{int8},{int4},{int2},{numeric},{float4},{float8},{interval}}
bit_and          4              {{int2},{int4},{int8},{bit}}
bit_or           4              {{int2},{int4},{int8},{bit}}
bool_and         1              {{bool}}

Still on my wish list are

  • Figuring out how to execute arrays (we aren't using array fields now, and only have a few places that we ever might. At that point, I don't expect we'll try and support pivots on arrays. tab tool

  • Getting all of the aliases for the various types. it seems like (?) int8, etc. can come through from pg_attribute in multiple ways. For example, timestamptz can come back from "timestamp with time zone".

These results are going to be consumed by client-side code and processed, so I don't need to get Postgres to figure everything out in one query, just enough for me to get the job done.

In any case, thanks very, very much.

Upvotes: 2

Related Questions