MHS
MHS

Reputation: 35

Group by multiple columns and get result as an array in separate columns

I have a table like the following:

| col_A | col_B |
|-------|-------|
| 1     | 1     |
| 1     | 2     |
| 1     | 3     |
| 2     | 1     |
| 2     | 2     |
| 2     | 3     |
| 3     | 1     |
| 3     | 2     |

I want to group and concatenate the results into an array like the following:

| col_A | col_B |
|-------|-------|
| 1,2   | 1,2,3 |
| 3     |  1,2  |

My attempt at writing a query:

SELECT col_A, array_agg(col_B ORDER BY col_B DESC) FROM table GROUP BY col_A;

However, this outputs:

| col_A | col_B   |
|-------|---------|
| 1     | {1,2,3} |
| 2     | {1,2,3} |
| 3     | {1,2}   |

Upvotes: 3

Views: 8192

Answers (2)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656231

In Postgres, ordering rows in a subquery is typically cheaper than adding an ORDER BY clause to the aggregate function:

SELECT array_agg(col_a) AS col_a, col_b
FROM  (
   SELECT col_a, array_agg(col_b) AS col_b
   FROM  (TABLE tbl ORDER BY col_a, col_b) t
   GROUP  BY 1
   ORDER  BY 1
   ) sub
GROUP  BY 2
ORDER  BY 1;

Produces your desired result exactly, including order of elements and rows.

db<>fiddle here

The 1st ORDER BY is to sort elements before constructing the array in col_b.
The 2nd ORDER BY is to sort elements before constructing the array in col_a.
The 3rd ORDER BY is to output rows in the order suggested by the OPs example result.

The query should work the same without 2nd and 3rd ORDER BY, because the 1st order is carried over. But there are no guarantees. (Things like parallelism might mess with order when querying big tables.) The presented query is guaranteed to produce the desired order.

Related:

About the positional references in GROUP BY and ORDER BY:

About (TABLE tbl ORDER BY col_a, col_b):

Upvotes: 1

Bergi
Bergi

Reputation: 664185

Seems like you want to aggregate col_b grouped by col_a, and then aggregate col_a grouped by the aggregated arrays. You can do this with a nested query, of which you did the first step already:

SELECT array_agg(col_a), col_b
FROM (SELECT col_a, array_agg(DISTINCT col_b ORDER BY col_b) AS col_b
      FROM example GROUP BY col_a) grouped_b
GROUP BY col_b;

(online demo)

Upvotes: 3

Related Questions