ant1j
ant1j

Reputation: 315

Grouping by similar values in multiple columns

I have a table of entities with an id, and a category (few different values with NULL allowed) from 3 different years (category can be different from 1 year to another), in 'wide' table format:

| ID  | CATEG_Y1 | CATEG_Y2 | CATEG_Y3 |
+-----+----------+----------+----------+
| 1   | NULL     | B        | C        |
| 2   | A        | A        | C        |
| 3   | B        | A        | NULL     |
| 4   | A        | C        | B        |
| ... | ...      | ...      | ...      |

I would like to simply count the number of entities by category, grouped by category, independently for the year:

+-------+----+----+----+
| CATEG | Y1 | Y2 | Y3 |
+-------+----+----+----+
| A     | 6  | 4  |  5 | <- 6 entities w/ categ_y1, 4 w/ categ_y2, 5 w/ categ_y3  
| B     | 3  | 1  | 10 |
| C     | 8  | 4  |  5 |
| NULL  | 3  | 3  |  3 |
+-------+----+----+----+

I guess I could do it by grouping values one column after the other and UNION ALL the results, but I was wondering if there was a more rapid & convenient way, and if it can be generalized if I have more columns/years to manage (e.g. 20-30 different values)

Upvotes: 0

Views: 209

Answers (3)

Gordon Linoff
Gordon Linoff

Reputation: 1269873

I would do this as using union all followed by aggregation:

select categ, sum(categ_y1) as y1, sum(categ_y2) as y2,
       sum(categ_y3) as y3
from ((select categ_y1, 1 as categ_y1, 0 as categ_y2, 0 as categ_y3
       from t
      ) union all
      (select categ_y2, 0 as categ_y1, 1 as categ_y2, 0 as categ_y3
       from t
      ) union all
      (select categ_y3, 0 as categ_y1, 0 as categ_y2, 1 as categ_y3
       from t
      )
     )
group by categ ;

Upvotes: 1

klin
klin

Reputation: 121604

Use jsonb functions to transpose the data (from the question) to this format:

select categ, jsonb_object_agg(key, count) as jdata
from (
    select value as categ, key, count(*)
    from my_table t,
    jsonb_each_text(to_jsonb(t)- 'id')
    group by 1, 2
    ) s
group by 1
order by 1;

 categ |                     jdata                     
-------+-----------------------------------------------
 A     | {"categ_y1": 2, "categ_y2": 2}
 B     | {"categ_y1": 1, "categ_y2": 1, "categ_y3": 1}
 C     | {"categ_y2": 1, "categ_y3": 2}
       | {"categ_y1": 1, "categ_y3": 1}
(4 rows)

For a known (static) number of years you can easily unpack the jsonb column:

select categ, jdata->'categ_y1' as y1, jdata->'categ_y2' as y2, jdata->'categ_y3' as y3 
from (
    select categ, jsonb_object_agg(key, count) as jdata
    from (
        select value as categ, key, count(*)
        from my_table t,
        jsonb_each_text(to_jsonb(t)- 'id')
        group by 1, 2
        ) s
    group by 1
    ) s
order by 1;

 categ | y1 | y2 | y3 
-------+----+----+----
 A     | 2  | 2  | 
 B     | 1  | 1  | 1
 C     |    | 1  | 2
       | 1  |    | 1
(4 rows)    

To get fully dynamic solution you can use the function create_jsonb_flat_view() described in Flatten aggregated key/value pairs from a JSONB field.

Upvotes: 1

Stephan Lechner
Stephan Lechner

Reputation: 35154

A bit clumsy, but probably someone has a better idea. Query first collects all diferent categories (the union-query in the from part), and then counts the occurences with dedicated subqueries in the select part. One could omit the union-part if there is a table already defining the available categories (I suppose categ_y1 is a foreign key to such a primary category table). Hope there are not to many typos:

select categories.cat,
       (select count(categ_y1) from table ty1 where select categories.cat = categ_y1) as y1,
       (select count(categ_y2) from table ty2 where select categories.cat = categ_y2) as y2,
       (select count(categ_y3) from table ty3 where select categories.cat = categ_y3) as y3
from ( select categ_y1 as cat from table t1
 union select categ_y2 as cat from table t2
 union select categ_y3 as cat from table t3) categories 

Upvotes: 1

Related Questions