Reputation: 121

Sum of two counts from one table with additional data from another table

I have two tables as follows:

TABLE A

| id | col_a | col_b | user_id |
--------------------------------
| 1  | false | true  | 1       |
| 2  | false | true  | 2       |
| 3  | true  | true  | 2       |
| 4  | true  | true  | 3       |
| 5  | true  | false | 1       |

TABLE B

| id | name  |
--------------
| 1  | Bob   |
| 2  | Jim   | 
| 3  | Helen |
| 4  | Michael|
| 5  | Jen   |

I want to get the sum of two counts, which are the number of true values in col_a and number of true values in col_b. I want to group that data by user_id. I also want to join Table B and get the name of each user. The result would look like this:

|user_id|total (col_a + col_b)|name
------------------------------------
| 1     | 2                   | Bob
| 2     | 3                   | Jim
| 3     | 2                   | Helen

So far I got the total sum with the following query:

 SELECT
(SELECT COUNT(*) FROM "TABLE_A" WHERE "col_a" is true)+
(SELECT COUNT(*) FROM "TABLE_A" WHERE "col_b" is true)
as total

However, I'm not sure how to proceed with grouping these counts by user_id.

Upvotes: 1

Answers (3)

Erwin Brandstetter

Reputation: 656391

Something like this is typically fastest:

SELECT *
FROM   "TABLE_B" b
JOIN  (
   SELECT user_id AS id
        , count(*) FILTER (WHERE col_a)
        + count(*) FILTER (WHERE col_b) AS total
   FROM   "TABLE_A"
   GROUP  BY 1
   ) a USING (id);

While fetching all rows, aggregate first, join later. That's cheaper. See:

Query with LEFT JOIN not returning rows for count of 0

The aggregate FILTER clause is typically fastest. See:

Often, you want to keep total counts of 0 in the result. You did say:

get the name of each user.

SELECT b.id AS user_id, b.name, COALESCE(a.total, 0) AS total
FROM   "TABLE_B" b
LEFT   JOIN (
   SELECT user_id AS id
        , count(col_a OR NULL)
        + count(col_b OR NULL) AS total
   FROM   "TABLE_A"
   GROUP  BY 1
   ) a USING (id);
...

count(col_a OR NULL) is an equivalent alternative, shortest, and still fast. (Use the FILTER clause from above for best performance.)
The LEFT JOIN keeps all rows from "TABLE_B" in the result.
COALESCE() return 0 instead of NULL for the total count.

If col_a and col_b have only few true values, this is typically (much) faster - basically what you had already:

SELECT b.*, COALESCE(aa.ct, 0) + COALESCE(ab.ct, 0) AS total
FROM   "TABLE_B" b
LEFT   JOIN (
   SELECT user_id AS id, count(*) AS ct
   FROM   "TABLE_A"
   WHERE  col_a
   GROUP  BY 1
   ) aa USING (id)
LEFT   JOIN (
   SELECT user_id AS id, count(*) AS ct
   FROM   "TABLE_A"
   WHERE  col_b
   GROUP  BY 1
   ) ab USING (id);

Especially with (small in this case!) partial indexes like:

CREATE INDEX a_true_idx on "TABLE_A" (user_id) WHERE col_a;
CREATE INDEX b_true_idx on "TABLE_A" (user_id) WHERE col_b;

Aside: use legal, lower-case unquoted names in Postgres to make your like simpler.

Are PostgreSQL column names case-sensitive?

Upvotes: 1

Rui Lima

Reputation: 7403

You are double counting JIM, if that is not supposed since it only shows up in two rows and not three, maybe you can do the following:

with cte_A as (
    select col_a as col, user_id 
    from A 
    where col_a=true
    union -- ALL -- (if you want to double count Jim)
    select col_b as col, user_id 
    from A 
    where col_b=true
)
select B.user_id, sum(*) as total, B.name
from cte_A
join B
on cte_A.user_id = B.user_id
group by B.user_id

If you want to actually double count then use the UNION ALL instead of UNION

Upvotes: 0

eshirvana

Reputation: 24568

select user_id,name
 , count(case when col_a = true then 1 end)
 + count(case when col_b = true then 1 end) total
from tableA a
join TableB b on a.user_id= b.id 
group by user_id,name

Upvotes: 1

Sum of two counts from one table with additional data from another table

Answers (3)

Related Questions