Uwe Keim
Uwe Keim

Reputation: 40746

Query for master records that have matching detail records

Currently I'm having the following table structure.

Master table Documents:

ID Filename
1 document1.pdf
2 document2.pdf
3 document3.pdf

Detail table Keywords:

ID DocumentID Keyword
1 1 KeywordA
2 1 KeywordB
3 1 KeywordC
4 2 KeywordB
5 3 KeywordA
6 3 KeywordD

Code to create this:

CREATE TABLE Documents (
    ID int IDENTITY(1,1) PRIMARY KEY,
    Filename nvarchar(255) NOT NULL
);

CREATE TABLE Keywords (
    ID int IDENTITY(1,1) PRIMARY KEY,
    DocumentID int NOT NULL,
    Keyword nvarchar(255) NOT NULL
);

INSERT INTO Documents(Filename) VALUES 
  ('document1.pdf'), ('document2.pdf'), ('document3.pdf');

INSERT INTO Keywords(DocumentID, Keyword) VALUES 
  (1, 'KeywordA'),
  (1, 'KeywordB'),
  (1, 'KeywordC'),
  (2, 'KeywordB'),
  (3, 'KeywordA'),
  (3, 'KeywordD');

SQL Fiddle for this.

Finding with one keyword

I'm looking for a way to get all documents matching a certain keyword.

This could be e.g. written with the following T-SQL query:

SELECT Documents.* 
FROM Documents 
WHERE Documents.ID IN
(
  SELECT Keywords.DocumentID 
  FROM Keywords 
  WHERE Keywords.Keyword = 'KeywordA'
)

This works successfully.

Finding with multiple keywords

What I'm currently stuck with is when I want to find all documents that match multiple keyword, combined with logical AND.

E.g. find a document that has three detail records with keyword A, B and C.

I think the following might work, but I don't know whether this performant or elegant at all:

SELECT Documents.* 
FROM Documents 
WHERE Documents.ID IN
(
  SELECT Keywords.DocumentID 
  FROM Keywords
  WHERE 
    Keywords.Keyword = 'KeywordA' OR 
    Keywords.Keyword = 'KeywordB' 
  GROUP BY Keywords.DocumentID HAVING COUNT(*) = 2 
)

SQL Fiddle for that.

My question

How to write a (performant) SQL query to find all documents that have multiple keywords associated.

If it is easier, a solution with a constant number of keywords (e.g. 3) would be sufficient.

Upvotes: 1

Views: 506

Answers (2)

Arulkumar
Arulkumar

Reputation: 13237

I hope the following query can help you

SELECT D.ID
FROM Documents D
JOIN Keywords K ON K.DocumentID = D.ID
WHERE K.Keyword IN ('KeywordA', 'KeywordB', 'KeywordC')  
GROUP BY D.ID
HAVING COUNT(DISTINCT K.Keyword) = 3 

Demo

Upvotes: 1

Charlieface
Charlieface

Reputation: 72153

The technique you are trying to do is called Relational Division With Remainder, in other words: find all groups which contain a particular set of rows.

Your current query is one of the standard ways of doing this, there are others.

If you had the keywords in a table variable or TVP, ...

DECLARE @keywords AS TABLE (Keyword varchar(50));
INSERT @keywords VALUES
('KeywordA'), ('KeywordB'), ('KeywordC');

... you could make it much neater with the following:

SELECT d.* 
FROM Documents d
WHERE d.ID IN
(
  SELECT k.DocumentID 
  FROM Keywords k
  JOIN @keywords kt ON kt.Keyword = k.Keyword
  GROUP BY k.DocumentID
  HAVING COUNT(*) = (SELECT COUNT(*) FROM @keywords)
);

Another option:

SELECT d.* 
FROM Documents d 
WHERE EXISTS (SELECT 1
  FROM @keywords kt
  LEFT JOIN Keywords k ON kt.Keyword = k.Keyword
      AND k.DocumentID = d.ID
  HAVING COUNT(*) = COUNT(k.keywords) -- there are no missing matches
);

And another, slightly confusing one:

SELECT d.* 
FROM Documents d
WHERE NOT EXISTS (SELECT 1
    FROM @keywords kt
    WHERE NOT EXISTS (SELECT 1
        FROM Keywords k
        WHERE k.Keyword = kt.Keyword
        AND K.DocumentID = d.ID
    )
);
-- For each document, there are no keywords for which there is no match

Upvotes: 1

Related Questions