Harry .Naeem
Harry .Naeem

Reputation: 1331

Match a list of tags/labels EXACTLY in sql table

I have a requirement of getting rows that EXACTLY match a list of labels or tags. I have a Labels main table, ContentLabels table with Label FK, and a Contents table.

Content Table: Id, Name, CustomerId

Label table: Id, Text

ContentLabels Table: ContentId, LabelId

I have a list of labels (any number of labels), i want to match exactly those labels while querying the data from sql. I tried the following solution from SO somewhere:

DECLARE @LabelTexts NVARCHAR(MAX) = 'coke,edible,chips';
DECLARE @Labels AS TABLE(
    [Text] NVARCHAR(128)
);
INSERT INTO @Labels ([Text])
SELECT [Data] FROM StrSplit(@LabelTexts, ',')

DECLARE @LabelsCount INT = (SELECT COUNT(*) FROM @Labels);

        SELECT  c.[CustomerID] ,
                c.[ContentID] AS Id ,
                c.ClusterId
         FROM ContentLabels cbl
         JOIN Labels l ON l.ClusterId = cbl.LabelClusterId
         JOIN Content c ON c.ClusterId = cbl.ContentClusterId
         GROUP BY c.[CustomerID], 
                c.[ContentID],
                c.ClusterId
        HAVING 
            COUNT(DISTINCT CASE WHEN l.[Text] IN( 'coke','chips','edible') THEN l.[Text] END)=@LabelsCount
            AND COUNT(DISTINCT CASE WHEN l.[Text] NOT IN( 'coke','chips','edible') THEN l.[Text] END) = 0;

When the above query execues, it works perfectly fine as i have added tags/labels in hard form in the IN clause in Having clause. But when i added the following query in the HAVING clause:

HAVING COUNT(DISTINCT CASE WHEN l.[Text] IN(SELECT * FROM @Labels) THEN l.[Text] END)=@LabelsCount
            AND COUNT(DISTINCT CASE WHEN l.[Text] NOT IN(SELECT * FROM @Labels) THEN l.[Text] END) = 0;

I get the following error:

Cannot perform an aggregate function on an expression containing an aggregate or a subquery.

And it is just because of

SELECT * FROM @Labels

in having clause. Its been several hours looking into the solution for it but didn't succeed. Any help would be commendable.

Upvotes: 0

Views: 781

Answers (3)

Charlieface
Charlieface

Reputation: 71995

You need to left join the matches you want, then do conditional aggergation.

This type of query is called "Relational Division Without Remainder", see Joe Celko et al on this.

SELECT  c.[CustomerID] ,
                c.[ContentID] AS Id ,
                c.ClusterId
    FROM ContentLabels cbl
    JOIN Labels l ON l.ClusterId = cbl.LabelClusterId
    JOIN Content c ON c.ClusterId = cbl.ContentClusterId
    LEFT JOIN @labels labelMatches ON labelMatches.[Text] = l.[Text]
    GROUP BY c.[CustomerID], 
        c.[ContentID],
        c.ClusterId
    HAVING 
        COUNT(DISTINCT labelMatches.[Text] END)=@LabelsCount
        AND COUNT(CASE WHEN labelMatches.[Text] IS NULL THEN 1 END) = 0;

Upvotes: 1

Giorgos Betsos
Giorgos Betsos

Reputation: 72185

You can use a combination of NOT EXITS and GROUP BY:

SELECT 
    c.[CustomerID]
    , c.[ContentID]
    , c.ClusterId
FROM Content c
JOIN ContentLabels cl
JOIN Labels l ON l.ClusterId = cl.LabelClusterId
WHERE NOT EXISTS 
(
    -- to filter out c.ClusterId values being related to labels NOT IN the specified list
    SELECT 1
    FROM ContentLabels cl2
    JOIN Labels l2 ON l2.ClusterId = cl2.LabelClusterId
    WHERE (l2.[Text] NOT IN(SELECT * FROM @Labels)) 
            AND c.ClusterId = cl2.ContentClusterId
)
GROUP BY  
    c.[CustomerID]
    , c.[ContentID]
    , c.ClusterId
HAVING COUNT(DISTINCT l.[Text]) = @LabelsCount

Upvotes: 0

forpas
forpas

Reputation: 164139

Use the operator LIKE:

HAVING COUNT(DISTINCT CASE WHEN ',' + @LabelTexts + ',' LIKE '%,' + l.[Text] + ',%' THEN l.[Text] END) = @LabelsCount
   AND COUNT(DISTINCT CASE WHEN ',' + @LabelTexts + ',' NOT LIKE '%,' + l.[Text] + ',%' THEN l.[Text] END) = 0;

Upvotes: 0

Related Questions