wheresmycookie
wheresmycookie

Reputation: 773

Postgres count total matches per group

Input data

I have the following association table:

AssociationTable
 - Item ID: Integer
 - Tag ID: Integer

Referring to the following example data

Item  Tag
1     1
1     2
1     3
2     1

and some input list of tags T (e.g. [1, 2])

What I want

For each item, I would like to know which tags were not provided in the input list T.

With our sample data, we'd get:

Item   Num missing
1      1
2      0

My thoughts

The best I've done so far is: select "ItemId", count("TagId") as "Num missing" from "AssociationTab" where "TagId" not in (1) group by "ItemId";

The problem here is that items where all tags match will not be included in the output.

Upvotes: 1

Views: 52

Answers (2)

Dirbaio
Dirbaio

Reputation: 3142

Simplest solution is

SELECT
    ItemId,
    count(*) FILTER (WHERE TagId NOT IN (1,2))
FROM AssociationTab
GROUP BY ItemId

Alternatively, if you already have an Items table with the item list, you could do this:

SELECT
    i.ItemId,
    count(a.TagId)
FROM Items i
LEFT JOIN AssociationTab a ON a.ItemId = i.ItemId AND a.TagId NOT IN (1,2)
GROUP BY i.ItemId

The key is that LEFT JOIN does not remove the Items row if no tags match.

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521914

You could use a calendar table with anti-join approach:

WITH cte AS (
    SELECT t1.Item, t2.Tag
    FROM (SELECT DISTINCT Item FROM AssociationTable) t1
    CROSS JOIN (SELECT 1 AS Tag UNION ALL SELECT 2) t2
)

SELECT
    t1.Item,
    COUNT(*) FILTER (WHERE t2.Item IS NULL) AS num_missing
FROM cte t1
LEFT JOIN AssociationTable t2
    ON t1.Item = t2.Item AND
       t1.Tag = t2.Tag AND
       t2.Tag IN (1, 2)
GROUP BY
    t1.Item;

Demo

The strategy here is to build a calendar/reference table in the first CTE which contains all combinations of items and tags. Then, we left join this CTE to your association table, aggregate by item, and then detect how many tags are missing for each item.

Upvotes: 2

Related Questions