Reputation: 1331
I have a requirement of getting rows that EXACTLY match a list of labels or tags. I have a Labels main table, ContentLabels table with Label FK, and a Contents table.
Content Table: Id, Name, CustomerId
Label table: Id, Text
ContentLabels Table: ContentId, LabelId
I have a list of labels (any number of labels), i want to match exactly those labels while querying the data from sql. I tried the following solution from SO somewhere:
DECLARE @LabelTexts NVARCHAR(MAX) = 'coke,edible,chips';
DECLARE @Labels AS TABLE(
[Text] NVARCHAR(128)
);
INSERT INTO @Labels ([Text])
SELECT [Data] FROM StrSplit(@LabelTexts, ',')
DECLARE @LabelsCount INT = (SELECT COUNT(*) FROM @Labels);
SELECT c.[CustomerID] ,
c.[ContentID] AS Id ,
c.ClusterId
FROM ContentLabels cbl
JOIN Labels l ON l.ClusterId = cbl.LabelClusterId
JOIN Content c ON c.ClusterId = cbl.ContentClusterId
GROUP BY c.[CustomerID],
c.[ContentID],
c.ClusterId
HAVING
COUNT(DISTINCT CASE WHEN l.[Text] IN( 'coke','chips','edible') THEN l.[Text] END)=@LabelsCount
AND COUNT(DISTINCT CASE WHEN l.[Text] NOT IN( 'coke','chips','edible') THEN l.[Text] END) = 0;
When the above query execues, it works perfectly fine as i have added tags/labels in hard form in the IN clause in Having clause. But when i added the following query in the HAVING clause:
HAVING COUNT(DISTINCT CASE WHEN l.[Text] IN(SELECT * FROM @Labels) THEN l.[Text] END)=@LabelsCount
AND COUNT(DISTINCT CASE WHEN l.[Text] NOT IN(SELECT * FROM @Labels) THEN l.[Text] END) = 0;
I get the following error:
Cannot perform an aggregate function on an expression containing an aggregate or a subquery.
And it is just because of
SELECT * FROM @Labels
in having clause. Its been several hours looking into the solution for it but didn't succeed. Any help would be commendable.
Upvotes: 0
Views: 781
Reputation: 71995
You need to left join the matches you want, then do conditional aggergation.
This type of query is called "Relational Division Without Remainder", see Joe Celko et al on this.
SELECT c.[CustomerID] ,
c.[ContentID] AS Id ,
c.ClusterId
FROM ContentLabels cbl
JOIN Labels l ON l.ClusterId = cbl.LabelClusterId
JOIN Content c ON c.ClusterId = cbl.ContentClusterId
LEFT JOIN @labels labelMatches ON labelMatches.[Text] = l.[Text]
GROUP BY c.[CustomerID],
c.[ContentID],
c.ClusterId
HAVING
COUNT(DISTINCT labelMatches.[Text] END)=@LabelsCount
AND COUNT(CASE WHEN labelMatches.[Text] IS NULL THEN 1 END) = 0;
Upvotes: 1
Reputation: 72185
You can use a combination of NOT EXITS
and GROUP BY
:
SELECT
c.[CustomerID]
, c.[ContentID]
, c.ClusterId
FROM Content c
JOIN ContentLabels cl
JOIN Labels l ON l.ClusterId = cl.LabelClusterId
WHERE NOT EXISTS
(
-- to filter out c.ClusterId values being related to labels NOT IN the specified list
SELECT 1
FROM ContentLabels cl2
JOIN Labels l2 ON l2.ClusterId = cl2.LabelClusterId
WHERE (l2.[Text] NOT IN(SELECT * FROM @Labels))
AND c.ClusterId = cl2.ContentClusterId
)
GROUP BY
c.[CustomerID]
, c.[ContentID]
, c.ClusterId
HAVING COUNT(DISTINCT l.[Text]) = @LabelsCount
Upvotes: 0
Reputation: 164139
Use the operator LIKE:
HAVING COUNT(DISTINCT CASE WHEN ',' + @LabelTexts + ',' LIKE '%,' + l.[Text] + ',%' THEN l.[Text] END) = @LabelsCount
AND COUNT(DISTINCT CASE WHEN ',' + @LabelTexts + ',' NOT LIKE '%,' + l.[Text] + ',%' THEN l.[Text] END) = 0;
Upvotes: 0