Darthg8r
Darthg8r

Reputation: 12685

Exclusive-Or in SQL Many to Many Relationship

I have a scenario where I want to find a list of records in a table joined to another through a many to many relationship using an exclusive-or type of relationship. Given the contrived example below, I need a list of categories that are assigned to at least one article, but not all articles. I could brute force this by looping through all of the categories, but that's extremely inefficient. Is there a nice clean way to do this in T-SQL on MS SQL Server?

    CREATE TABLE [dbo].[ArticleCategories](
    [ArticleId] [int] NOT NULL,
    [CategoryId] [int] NOT NULL,
 CONSTRAINT [PK_ArticleCategories] PRIMARY KEY CLUSTERED 
(
    [ArticleId] ASC,
    [CategoryId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]


CREATE TABLE [dbo].[Articles](
    [Id] [int] NOT NULL,
    [Title] [nvarchar](100) NOT NULL,
 CONSTRAINT [PK_Articles] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]


CREATE TABLE [dbo].[Categories](
    [Id] [int] NOT NULL,
    [CategoryName] [nvarchar](100) NULL,
 CONSTRAINT [PK_Categories] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]


insert into Articles ( Id, Title ) values ( 1, 'Jon Snow')
insert into Articles ( Id, Title ) values ( 2, 'Joffry Baratheon')
insert into Articles ( Id, Title ) values ( 3, 'Cercei Lanister')
insert into Articles ( Id, Title ) values ( 4, 'Sansa Stark')
insert into Articles ( Id, Title ) values ( 5, 'Khal Drogo')
insert into Articles ( Id, Title ) values ( 6, 'Ramsey Bolton')
insert into Articles ( Id, Title ) values ( 7, 'Melisandre')

insert into Categories ( Id, CategoryName ) values ( 1, 'Orange')
insert into Categories ( Id, CategoryName ) values ( 2, 'Blue')
insert into Categories ( Id, CategoryName ) values ( 3, 'Purple')
insert into Categories ( Id, CategoryName ) values ( 4, 'Green')
insert into Categories ( Id, CategoryName ) values ( 5, 'Violet')
insert into Categories ( Id, CategoryName ) values ( 6, 'Yellow')
insert into Categories ( Id, CategoryName ) values ( 7, 'Black')

insert into ArticleCategories (ArticleId, CategoryId) values (1, 1 )
insert into ArticleCategories (ArticleId, CategoryId) values (2, 1 )
insert into ArticleCategories (ArticleId, CategoryId) values (3, 1 )
insert into ArticleCategories (ArticleId, CategoryId) values (4, 1 )
insert into ArticleCategories (ArticleId, CategoryId) values (5, 1 )
insert into ArticleCategories (ArticleId, CategoryId) values (6, 1 )
insert into ArticleCategories (ArticleId, CategoryId) values (7, 1 )
insert into ArticleCategories (ArticleId, CategoryId) values (2, 2 )
insert into ArticleCategories (ArticleId, CategoryId) values (3, 2 )
insert into ArticleCategories (ArticleId, CategoryId) values (5, 3 )
insert into ArticleCategories (ArticleId, CategoryId) values (7, 3 )

In this scenario, the query would not return the category 'Orange' because it is assigned to all of the Articles. It would return 'Blue' and 'Purple' because they are assigned to at least one article, but not all. The other categories will not return at all because they aren't assigned at all.

The expected results would be:

2|Blue
3|Purple 

Updated to include sample data and expected output.

Upvotes: 1

Views: 74

Answers (2)

Olivier Jacot-Descombes
Olivier Jacot-Descombes

Reputation: 112482

The conditions can be tested without joins. The join is only necessary for the category name in the result

SELECT AC.CategoryId, C.CategoryName
FROM
    ArticleCategories AC
    INNER JOIN Categories C
        ON AC.CategoryId = C.CategoryId
GROUP BY AC.CategoryID
HAVING Count(*) < (SELECT Count(*) FROM Articles)

The table ArticleCategories contains only information on groups that have been assigned to an article at least once, therefore no extra condition is required for this.

Since the Primary Key of ArticleCategories includes both columns (ArticleId and CategoryId) there can be no duplicate article per category. Therefore, the count per category is equal to the number of articles this category has been assigned to.

Note that I am using the HAVING-clause, not the WHERE-clause. The WHERE-clause is applied before grouping. The HAVING-clause is applied after grouping and can refer to aggregate results.

Upvotes: 2

xQbert
xQbert

Reputation: 35333

Using your sample: http://rextester.com/THCJ13143

and a query using group by and having:

SELECT AC.CategoryID, c.categoryName
FROM ArticleCategories AC
LEFT JOIN Categories C
  on C.ID = AC.CategoryID
GROUP BY AC.CategoryID, c.Categoryname
HAVING count(AC.ArticleID) < (SELECT count(*) FROM Articles)

We get:

CategoryID  categoryName
 2          Blue
 3          Purple

Upvotes: 1

Related Questions