Mike
Mike

Reputation: 1999

Replace dozens of joins with something more efficient to count individually grouped columns

There is a table (in PostgreSQL) with multiple columns (from n1 to n10) each of which contains individual digits in every row (in the example below the digits are 1, 2 and 3 for simplicity purposes). There are thousands of rows in the table. Excerpt of the data in the table:

+----+----+----+------+-----+
| n1 | n2 | n3 | n... | n10 |
+----+----+----+------+-----+
|  3 |  2 |  1 |    1 |   1 |
|  2 |  1 |  2 |    2 |   3 |
|  1 |  1 |  2 |    3 |   1 |
|  2 |  3 |  1 |    1 |   2 |
|  3 |  2 |  1 |    1 |   2 |
|  1 |  3 |  1 |    3 |   3 |
|  2 |  3 |  1 |    3 |   3 |
|  1 |  1 |  3 |    3 |   1 |
|  3 |  2 |  3 |    1 |   2 |
|  2 |  1 |  2 |    2 |   1 |
+----+----+----+------+-----+

What I'm trying to do is to count individual digits in every column so that the result table looks like this:

+--------+----------+----------+----------+------------+-----------+
| number | n1_count | n2_count | n3_count | n..._count | n10_count |
+--------+----------+----------+----------+------------+-----------+
|      1 |        3 |        4 |        5 |          4 |         4 |
|      2 |        4 |        3 |        3 |          2 |         3 |
|      3 |        3 |        3 |        2 |          4 |         3 |
+--------+----------+----------+----------+------------+-----------+

And I've managed to achieve this by using multiple left joins:

SELECT number, n1_count, n2_count, n3_count, n..._count, n10_count FROM
  (VALUES ('1'), ('2'), ('3') AS t (number)
LEFT JOIN
  (SELECT n1, COUNT(n1) AS n1_count FROM table GROUP BY n1) AS n1_ ON n1 = number
LEFT JOIN
  (SELECT n2, COUNT(d2) AS n2_count FROM table GROUP BY n2) AS n2_ ON n2 = number
LEFT JOIN
  (SELECT n3, COUNT(d3) AS n3_count FROM table GROUP BY n3) AS n3_ ON n3 = number
LEFT JOIN
  (SELECT n..., COUNT(d4) AS n..._count FROM table GROUP BY n...) AS n..._ ON n... = number
LEFT JOIN
  (SELECT n10, COUNT(d5) AS n10_count FROM table GROUP BY n10) AS n10_ ON n10 = number;

But the final query (10 left joins) looks scary huge and quite complicated so I am wondering if the same result can be achieved via more elegant and efficient way? Please point me to the options I have there.

Upvotes: 1

Views: 37

Answers (1)

user330315
user330315

Reputation:

You can use filtered aggregation and a single join using an array:

select t.number, 
       count(*) filter (where tt.n1 = t.number) as n1_count,
       count(*) filter (where tt.n2 = t.number) as n2_count,
       count(*) filter (where tt.n3 = t.number) as n3_count,
       count(*) filter (where tt.n4 = t.number) as n4_count
from the_table tt
  join (values (1),(2),(3) ) as t(number) on t.number = any(array[tt.n1,tt.n2,tt.n3,tt.n4])
group by t.number;  

Upvotes: 2

Related Questions