user3496167
user3496167

Reputation: 175

Transpose PostgreSQL table or aggregation function doing nothing

I've got a PostgreSQL table which I have to transpose.
The original data looks like:

user_id role_id assigned
P0001   SA      t
P0001   Guest   t
P0002   SA      f
P0002   Guest   t

After transposing, it should look like as follows:

user_id SA  Guest
P0001   t   t
P0002   f   t

The following is what I used to generate the transposed table:

SELECT user_id, 
CAST(CAST(CASE role_id WHEN 'SA' THEN 1 ELSE 0 END) AS int) AS boolean) AS "SA",
CAST(CAST((CASE role_id WHEN 'Guest' THEN 1 ELSE 0 END) AS int) AS boolean) AS "Guest" 
FROM user_to_roles GROUP BY user_id

But it looks ugly. Is there a way to write an custom aggregation function doing nothing but returning the original value. So the above statement can be re-written as:

SELECT user_id, 
Do_Nothing(CASE role_id WHEN 'SA' THEN true ELSE false END) AS "SA",
Do_Nothing(CASE role_id WHEN 'Guest' THEN true ELSE false END) AS "Guest" 
FROM user_to_roles GROUP BY user_id

Edit:

Actually, I need a dynamic generated columns and implemented it using the cursor by the help of this article. And I looked into the documentation and write a custom aggregation like this:

CREATE AGGREGATE do_nothing(anyelement) (
  SFUNC=donothing,
  STYPE=anyelement,
  INITCOND=''
);

CREATE OR REPLACE FUNCTION donothing(anyelement var)
  RETURNS anyelement AS
$BODY$
    RETURN var;
$BODY$
  LANGUAGE sql IMMUTABLE
  COST 100;

Since I want to make it generic, anyelment was used here.
But the above code doesn't work at all.

Upvotes: 1

Views: 805

Answers (2)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656566

crosstab()

SELECT * FROM crosstab(
       'SELECT user_id, role_id, assigned
        FROM   tbl
        ORDER  BY 1,2'

      ,$$VALUES ('SA'::text), ('Guest')$$)
AS ct (user_id text, "SA" bool, "Guest" bool);

That's as fast as it gets with current versions of Postgres. (Faster than alternatives below.)

But neither of these solutions is completely "dynamic", which is not completely possible:

For a completely dynamic solution, you would have to output arrays or some other container type to return a variable number of columns. Recent related question with answers:

Poor man's crosstab with aggregated CASE expressions

For just a few columns: simple CASE statements combined with an aggregate function.

min() or max() work for most basic types ... per documentation:

any array, numeric, string, or date/time type

But not for boolean. Use bool_or() like @Richard suggested.
FALSE OR NULL would yield NULL, but all mentioned aggregate functions (incl. bool_or()) ignore NULL, so it works with NULL as well.

SELECT user_id
      ,bool_or(CASE WHEN role_id = 'SA'    THEN assigned ELSE NULL END) AS "SA"
      ,bool_or(CASE WHEN role_id = 'Guest' THEN assigned ELSE NULL END) AS "Guest"
FROM   tbl
GROUP  BY 1
ORDER  BY 1;

You have to adapt to the data type expected.

Custom aggregate for any type

There is no standard aggregate function that works for any data type in the way you need it. You can create one:

CREATE OR REPLACE FUNCTION f_echo1(anyelement,anyelement)
  RETURNS anyelement AS 'SELECT $1' LANGUAGE sql IMMUTABLE STRICT;

CREATE AGGREGATE first_notnull(anyelement) (
   SFUNC = f_echo1
 , STYPE = anyelement
);

This is a simple, working version of what you outlined in your edit:
Starting with NULL (no INITCOND), using a STRICT transition function, it takes the first not-null value and the transition function just keeps that one. So we get the first not-null value back the aggregate function encounters.

This works for any input type:

SELECT user_id
      ,first_notnull(CASE WHEN role_id = 'SA'    THEN assigned ELSE NULL END) AS "SA"
      ,first_notnull(CASE WHEN role_id = 'Guest' THEN assigned ELSE NULL END) AS "Guest"
FROM   tbl
GROUP  BY 1
ORDER  BY 1;

Copy bool_or() as max()

Custom aggregate functions in sql or plpgsql are considerably slower than the built-in ones, which are written in C and optimized. You could write one in C yourself. Wouldn't be that hard. Or you can copy the definition of bool_or under the name of max in pg_proc and reference it in another copy in pg_aggregate to make max() work for boolean as well. Messing with catalog tables is not recommended for the casual user, though. If you break something your system might go belly-up ...

Upvotes: 1

Richard Huxton
Richard Huxton

Reputation: 22893

In your particular case you can get away with using a built-in aggregate bool_or. Something like:

SELECT user_id, 
bool_or(CASE role_id WHEN 'SA'    THEN assigned ELSE false END) AS "SA",
bool_or(CASE role_id WHEN 'Guest' THEN assigned ELSE false END) AS "Guest" 
FROM user_to_roles GROUP BY user_id

Upvotes: 1

Related Questions