Reputation: 12685
I have a scenario where I want to find a list of records in a table joined to another through a many to many relationship using an exclusive-or type of relationship. Given the contrived example below, I need a list of categories that are assigned to at least one article, but not all articles. I could brute force this by looping through all of the categories, but that's extremely inefficient. Is there a nice clean way to do this in T-SQL on MS SQL Server?
CREATE TABLE [dbo].[ArticleCategories](
[ArticleId] [int] NOT NULL,
[CategoryId] [int] NOT NULL,
CONSTRAINT [PK_ArticleCategories] PRIMARY KEY CLUSTERED
(
[ArticleId] ASC,
[CategoryId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
CREATE TABLE [dbo].[Articles](
[Id] [int] NOT NULL,
[Title] [nvarchar](100) NOT NULL,
CONSTRAINT [PK_Articles] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
CREATE TABLE [dbo].[Categories](
[Id] [int] NOT NULL,
[CategoryName] [nvarchar](100) NULL,
CONSTRAINT [PK_Categories] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
insert into Articles ( Id, Title ) values ( 1, 'Jon Snow')
insert into Articles ( Id, Title ) values ( 2, 'Joffry Baratheon')
insert into Articles ( Id, Title ) values ( 3, 'Cercei Lanister')
insert into Articles ( Id, Title ) values ( 4, 'Sansa Stark')
insert into Articles ( Id, Title ) values ( 5, 'Khal Drogo')
insert into Articles ( Id, Title ) values ( 6, 'Ramsey Bolton')
insert into Articles ( Id, Title ) values ( 7, 'Melisandre')
insert into Categories ( Id, CategoryName ) values ( 1, 'Orange')
insert into Categories ( Id, CategoryName ) values ( 2, 'Blue')
insert into Categories ( Id, CategoryName ) values ( 3, 'Purple')
insert into Categories ( Id, CategoryName ) values ( 4, 'Green')
insert into Categories ( Id, CategoryName ) values ( 5, 'Violet')
insert into Categories ( Id, CategoryName ) values ( 6, 'Yellow')
insert into Categories ( Id, CategoryName ) values ( 7, 'Black')
insert into ArticleCategories (ArticleId, CategoryId) values (1, 1 )
insert into ArticleCategories (ArticleId, CategoryId) values (2, 1 )
insert into ArticleCategories (ArticleId, CategoryId) values (3, 1 )
insert into ArticleCategories (ArticleId, CategoryId) values (4, 1 )
insert into ArticleCategories (ArticleId, CategoryId) values (5, 1 )
insert into ArticleCategories (ArticleId, CategoryId) values (6, 1 )
insert into ArticleCategories (ArticleId, CategoryId) values (7, 1 )
insert into ArticleCategories (ArticleId, CategoryId) values (2, 2 )
insert into ArticleCategories (ArticleId, CategoryId) values (3, 2 )
insert into ArticleCategories (ArticleId, CategoryId) values (5, 3 )
insert into ArticleCategories (ArticleId, CategoryId) values (7, 3 )
In this scenario, the query would not return the category 'Orange' because it is assigned to all of the Articles. It would return 'Blue' and 'Purple' because they are assigned to at least one article, but not all. The other categories will not return at all because they aren't assigned at all.
The expected results would be:
2|Blue
3|Purple
Updated to include sample data and expected output.
Upvotes: 1
Views: 74
Reputation: 112482
The conditions can be tested without joins. The join is only necessary for the category name in the result
SELECT AC.CategoryId, C.CategoryName
FROM
ArticleCategories AC
INNER JOIN Categories C
ON AC.CategoryId = C.CategoryId
GROUP BY AC.CategoryID
HAVING Count(*) < (SELECT Count(*) FROM Articles)
The table ArticleCategories
contains only information on groups that have been assigned to an article at least once, therefore no extra condition is required for this.
Since the Primary Key of ArticleCategories
includes both columns (ArticleId
and CategoryId
) there can be no duplicate article per category. Therefore, the count per category is equal to the number of articles this category has been assigned to.
Note that I am using the HAVING-clause, not the WHERE-clause. The WHERE-clause is applied before grouping. The HAVING-clause is applied after grouping and can refer to aggregate results.
Upvotes: 2
Reputation: 35333
Using your sample: http://rextester.com/THCJ13143
and a query using group by and having:
SELECT AC.CategoryID, c.categoryName
FROM ArticleCategories AC
LEFT JOIN Categories C
on C.ID = AC.CategoryID
GROUP BY AC.CategoryID, c.Categoryname
HAVING count(AC.ArticleID) < (SELECT count(*) FROM Articles)
We get:
CategoryID categoryName
2 Blue
3 Purple
Upvotes: 1