Reputation: 67251
Let's say I'm implementing articles with article tags. I'm using SQL Server 2008.
TABLE Articles
ArtID INT
...
TABLE Tags
TagID INT
TagText VARCHAR(10)
TABLE ArticleTags
ArtID INT
TagID INT
I'm trying to figure out the most efficient way to query all articles with a specific tags. Here's two options, both of which I've read are most efficient.
Method A:
SELECT a.* FROM Articles
WHERE EXISTS (
SELECT * FROM ArticleTags at
INNER JOIN Tags t ON at.TagID = t.TagID
WHERE at.ArtID = a.ID
AND t.TagText IN ('abc', 'def')
)
Method B:
SELECT a.* FROM Articles a
INNER JOIN ArticleTags at ON a.ArtID = at.ArtID
INNER JOIN Tags t ON at.TagID = t.TagID
WHERE t.TagText IN ('abc', 'def')
GROUP BY a.ArtID
Can any SQL experts suggest which is more efficient and why? Or maybe I'm on the wrong track.
Upvotes: 0
Views: 417
Reputation: 294387
As with almost all SQL performance questions, the answer is not the query, the answer is the data schema. What indexes you have, that is what drives the performance of your queries.
Usually many-to-many relations require two complementary indexes, one as (ID1, ID2)
and the other as (ID2, ID1)
. One of them is clustered, it doesn't really matter which one. So lets create a test DB (100k articles, 1K tags, 1-10 tags per article):
:setvar dbname testdb
:setvar articles 1000000
:setvar tags 1000
:setvar articletags 10
:on error exit
set xact_abort on;
go
use master;
go
if db_id('$(dbname)') is not null
begin
alter database [$(dbname)] set single_user with rollback immediate;
drop database [$(dbname)];
end
go
create database [$(dbname)];
go
use [$(dbname)];
go
create TABLE Articles (
ArtID INT not null identity(1,1),
name varchar(100) not null,
filler char(500) not null default replicate('X', 500),
constraint pk_Articles primary key clustered (ArtID));
go
create table Tags (
TagID INT not null identity(1,1),
TagText VARCHAR(10) not null,
constraint pk_Tags primary key clustered (TagID),
constraint unq_Tags_Text unique (TagText));
go
create TABLE ArticleTags (
ArtID INT not null,
TagID INT not null,
constraint fk_Articles
foreign key (ArtID)
references Articles (ArtID),
constraint fk_Tags
foreign key (TagID)
references Tags (TagID),
constraint pk_ArticleTags
primary key clustered (ArtID, TagID));
go
create nonclustered index ndxArticleTags_TagID
on ArticleTags (TagID, ArtID);
go
-- populate articles
set nocount on;
declare @i int =0, @name varchar(100);
begin transaction
while @i < $(articles)
begin
set @name = 'Name ' + cast(@i as varchar(10));
insert into Articles (name) values (@name);
set @i += 1;
if @i %1000 = 0
begin
commit;
raiserror (N'Inserted %d articles', 0, 1, @i);
begin transaction;
end
end
commit
go
-- populate tags
set nocount on;
declare @i int =0, @text varchar(100);
begin transaction
while @i < $(tags)
begin
set @text = 'Tag ' + cast(@i as varchar(10));
insert into Tags (TagText) values (@text);
set @i += 1;
if @i %1000 = 0
begin
commit;
raiserror (N'Inserted %d tags', 0, 1, @i);
begin transaction;
end
end
commit
go
-- populate article-tags
set nocount on;
declare @i int =0, @a int = 1, @cnt int, @tag int;
set @cnt = rand() * $(articletags) + 1;
set @tag = rand() * $(tags) + 1;
begin transaction
while @a < $(articles)
begin
insert into ArticleTags (ArtID, TagID) values (@a, @tag);
set @cnt -= 1;
set @tag += rand()*10+1;
if $(tags)<=@tag
begin
set @tag = 1;
end
if @cnt = 0
begin
set @cnt = rand() * $(articletags) + 1;
set @tag = rand() * $(tags) + 1;
set @a += 1;
end
set @i += 1;
if @i %1000 = 0
begin
commit;
raiserror (N'Inserted %d article-tags', 0, 1, @i);
begin transaction;
end
end
commit
raiserror (N'Final: %d article-tags', 0, 1, @i);
go
Now lets compare the two queries:
set statistics io on;
set statistics time on;
select a.ArtID
from Articles a
where exists (
select *
from ArticleTags at
join Tags t on at.TagID = t.TagID
where at.ArtID = a.ArtID
and t.TagText in ('Tag 10', 'Tag 12'));
SELECT a.ArtID FROM Articles a
INNER JOIN ArticleTags at ON a.ArtID = at.ArtID
INNER JOIN Tags t ON at.TagID = t.TagID
WHERE t.TagText IN ('Tag 10', 'Tag 12')
GROUP BY a.ArtID
Result:
Table 'Articles'. Scan count 0, logical reads 3561, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ArticleTags'. Scan count 2, logical reads 13, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Tags'. Scan count 2, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Articles'. Scan count 0, logical reads 3561, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ArticleTags'. Scan count 2, logical reads 13, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Tags'. Scan count 2, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Surprise! (well, not really). They're IDENTICAL. In fact, they have exactly the same execution plan.
Upvotes: 3
Reputation: 11
I would create an indexed view based on the 3 tables on the columns artID and TagText. That way you can use:
SELECT *
FROM Articles
WHERE artID IN
(SELECT artID
FROM ArticleTagTextView
WHERE TagText IN ('abc', 'def'))
Upvotes: 1
Reputation: 1694
Shortly: no difference. Both will be translated to the same execution plan.
Edit: haven't noticed the GROUP BY. This way the query most likely wont' compile. Remove the GROUP BY clause or list all fields of the table like GROUP BY Id, Name, ...
Upvotes: 0
Reputation: 7769
Your Method B has a GROUP BY clause, but your returning all columns from Articles, even presumably non-aggregateable columns. This would throw an error. The GROUP BY is probably unnecessary.
Without the GROUP BY, the queries have roughly the same execution plan. However Method B is a more standard SQL query statement.
Edit: DISTINCT is usually preferable to a GROUP BY in this case, and has the same function
SELECT DISTINCT
a.*
FROM
Articles a
INNER JOIN
ArticleTags at
ON
a.ArtID = at.ArtID
INNER JOIN
Tags t
ON
at.TagID = t.TagID
WHERE
t.TagText IN ('abc', 'def')
Upvotes: 1