Reputation: 781
Using the following schema:
create table awards(
aid int primary key
, name varchar(100) not null );
create table institutions(
iid int primary key
, name varchar(100) not null );
create table winners(
aid int
, iid int
, year int
, filmname varchar(100)
, personname varchar(100)
, primary key (aid, iid, year)
, foreign key tid references awards(aid)
, foreign key cid references institutions(iid) );
I created the following query:
SELECT nominees.personname as personname, awards.name as award, nominees.year as year
FROM nominees, institutions, awards WHERE institutions.iid = nominees.iid and
awards.aid = nominees.aid and personname is not null
GROUP BY nominees.personname, awards.name, nominees.year
HAVING ((awards.name, count(DISTINCT institutions.name)) in
(SELECT awards.name as
awards, count(DISTINCT institutions.name)
FROM nominees, awards, institutions
WHERE nominees.aid = awards.aid and nominees.iid = institutions.iid
GROUP BY awards.name))
ORDER BY nominees.personname, awards.name;
This query is designed to find all awards in which a single person has been nominated by every institution giving that award in a specific year. It essentially takes one person and counts the number of institutions that gave them a single award and compares that value to the max number of institutions that give that award.
The desired output should look something like the following:
"personname" "award" "year"
"Alexandre" "score" "2011"
"Skyfall" "song" "2013"
"Tangled" "song" "2011"
This gives the sets that I desire however I'm not sure whether or not it would be more efficient to do it a different way. I've attempted to get it to work with EXISTS but I didn't have much luck.
Main Question: Is there a more efficient way to do this query?
Upvotes: 1
Views: 91
Reputation: 753725
As usual with complex queries, I use TDQD — Test-Driven Query Design — to solve the problem in stages. Each stage can be tested individually, and the results checked, ensuring that you reach the correct answer.
I note that you show us three tables; your query uses two of those but mentions a fourth, nominees
. I am assuming that winners
is the same as nominees
, since you gave us that schema and asked about who won the award from every institution offering the award in a given year.
SELECT aid, year, COUNT(*) AS num_awards
FROM winners
GROUP BY aid, year;
SELECT aid, year, personname, COUNT(*) AS num_person_awards
FROM winners
GROUP BY aid, year, personname;
SELECT n.aid, n.year, w.personname
FROM (SELECT aid, year, COUNT(*) AS num_awards
FROM winners
GROUP BY aid, year
) AS n
JOIN (SELECT aid, year, personname, COUNT(*) AS num_person_awards
FROM winners
GROUP BY aid, year, personname
) AS w
ON n.aid = w.aid AND n.year = w.year AND n.num_awards = w.num_person_awards
SELECT a.name AS awardname, a.year, w.personname
FROM (SELECT aid, year, COUNT(*) AS num_awards
FROM winners
GROUP BY aid, year
) AS n
JOIN (SELECT aid, year, personname, COUNT(*) AS num_person_awards
FROM winners
GROUP BY aid, year, personname
) AS w
ON n.aid = w.aid AND n.year = w.year AND n.num_awards = w.num_person_awards
JOIN awards AS a
ON a.aid = n.aid;
I've not experimented with whether this is faster than your query, but it looks simpler, so I think it has a reasonable chance of working faster.
This is how I'd format your query:
SELECT nominees.personname AS personname, awards.name AS award, nominees.year AS year
FROM nominees
JOIN institutions ON institutions.iid = nominees.iid
JOIN awards ON awards.aid = nominees.aid
WHERE personname IS NOT NULL
GROUP BY nominees.personname, awards.name, nominees.year
HAVING (awards.name, COUNT(DISTINCT institutions.name) IN
(SELECT awards.name AS awards, COUNT(DISTINCT institutions.name)
FROM nominees, awards, institutions
WHERE nominees.aid = awards.aid and nominees.iid = institutions.iid
GROUP BY awards.name)
ORDER BY nominees.personname, awards.name;
Upvotes: 3