Exclusive-Or in SQL Many to Many Relationship

Question

I have a scenario where I want to find a list of records in a table joined to another through a many to many relationship using an exclusive-or type of relationship. Given the contrived example below, I need a list of categories that are assigned to at least one article, but not all articles. I could brute force this by looping through all of the categories, but that's extremely inefficient. Is there a nice clean way to do this in T-SQL on MS SQL Server?

    CREATE TABLE [dbo].[ArticleCategories](
    [ArticleId] [int] NOT NULL,
    [CategoryId] [int] NOT NULL,
 CONSTRAINT [PK_ArticleCategories] PRIMARY KEY CLUSTERED 
(
    [ArticleId] ASC,
    [CategoryId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]


CREATE TABLE [dbo].[Articles](
    [Id] [int] NOT NULL,
    [Title] [nvarchar](100) NOT NULL,
 CONSTRAINT [PK_Articles] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]


CREATE TABLE [dbo].[Categories](
    [Id] [int] NOT NULL,
    [CategoryName] [nvarchar](100) NULL,
 CONSTRAINT [PK_Categories] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]


insert into Articles ( Id, Title ) values ( 1, 'Jon Snow')
insert into Articles ( Id, Title ) values ( 2, 'Joffry Baratheon')
insert into Articles ( Id, Title ) values ( 3, 'Cercei Lanister')
insert into Articles ( Id, Title ) values ( 4, 'Sansa Stark')
insert into Articles ( Id, Title ) values ( 5, 'Khal Drogo')
insert into Articles ( Id, Title ) values ( 6, 'Ramsey Bolton')
insert into Articles ( Id, Title ) values ( 7, 'Melisandre')

insert into Categories ( Id, CategoryName ) values ( 1, 'Orange')
insert into Categories ( Id, CategoryName ) values ( 2, 'Blue')
insert into Categories ( Id, CategoryName ) values ( 3, 'Purple')
insert into Categories ( Id, CategoryName ) values ( 4, 'Green')
insert into Categories ( Id, CategoryName ) values ( 5, 'Violet')
insert into Categories ( Id, CategoryName ) values ( 6, 'Yellow')
insert into Categories ( Id, CategoryName ) values ( 7, 'Black')

insert into ArticleCategories (ArticleId, CategoryId) values (1, 1 )
insert into ArticleCategories (ArticleId, CategoryId) values (2, 1 )
insert into ArticleCategories (ArticleId, CategoryId) values (3, 1 )
insert into ArticleCategories (ArticleId, CategoryId) values (4, 1 )
insert into ArticleCategories (ArticleId, CategoryId) values (5, 1 )
insert into ArticleCategories (ArticleId, CategoryId) values (6, 1 )
insert into ArticleCategories (ArticleId, CategoryId) values (7, 1 )
insert into ArticleCategories (ArticleId, CategoryId) values (2, 2 )
insert into ArticleCategories (ArticleId, CategoryId) values (3, 2 )
insert into ArticleCategories (ArticleId, CategoryId) values (5, 3 )
insert into ArticleCategories (ArticleId, CategoryId) values (7, 3 )

In this scenario, the query would not return the category 'Orange' because it is assigned to all of the Articles. It would return 'Blue' and 'Purple' because they are assigned to at least one article, but not all. The other categories will not return at all because they aren't assigned at all.

The expected results would be:

2|Blue
3|Purple

Updated to include sample data and expected output.

Olivier Jacot-Descombes · Accepted Answer

The conditions can be tested without joins. The join is only necessary for the category name in the result

SELECT AC.CategoryId, C.CategoryName
FROM
    ArticleCategories AC
    INNER JOIN Categories C
        ON AC.CategoryId = C.CategoryId
GROUP BY AC.CategoryID
HAVING Count(*) < (SELECT Count(*) FROM Articles)

The table ArticleCategories contains only information on groups that have been assigned to an article at least once, therefore no extra condition is required for this.

Since the Primary Key of ArticleCategories includes both columns (ArticleId and CategoryId) there can be no duplicate article per category. Therefore, the count per category is equal to the number of articles this category has been assigned to.

Note that I am using the HAVING-clause, not the WHERE-clause. The WHERE-clause is applied before grouping. The HAVING-clause is applied after grouping and can refer to aggregate results.

Exclusive-Or in SQL Many to Many Relationship

Answers (2)

Related Questions