Night Train
Night Train

Reputation: 781

SQL Efficiency maybe I should use Exist?

Using the following schema:

create table awards(
aid int primary key
, name varchar(100) not null );

create table institutions(
iid int primary key
, name varchar(100) not null );

create table winners(
aid int
, iid int
, year int
, filmname varchar(100)
, personname varchar(100)
, primary key (aid, iid, year)
, foreign key tid references awards(aid)
, foreign key cid references institutions(iid) );

I created the following query:

SELECT nominees.personname as personname, awards.name as award, nominees.year as year 

FROM nominees, institutions, awards WHERE institutions.iid = nominees.iid and 
awards.aid = nominees.aid and personname is not null 

GROUP BY nominees.personname, awards.name, nominees.year 

HAVING ((awards.name, count(DISTINCT institutions.name)) in 
(SELECT awards.name as 
awards, count(DISTINCT institutions.name) 
FROM nominees, awards, institutions 
WHERE nominees.aid = awards.aid and nominees.iid = institutions.iid 
GROUP BY awards.name)) 

ORDER BY nominees.personname, awards.name;

This query is designed to find all awards in which a single person has been nominated by every institution giving that award in a specific year. It essentially takes one person and counts the number of institutions that gave them a single award and compares that value to the max number of institutions that give that award.

The desired output should look something like the following:

"personname"    "award" "year"

"Alexandre"     "score" "2011"
"Skyfall"       "song"  "2013"
"Tangled"       "song"  "2011"

This gives the sets that I desire however I'm not sure whether or not it would be more efficient to do it a different way. I've attempted to get it to work with EXISTS but I didn't have much luck.

Main Question: Is there a more efficient way to do this query?

Upvotes: 1

Views: 91

Answers (1)

Jonathan Leffler
Jonathan Leffler

Reputation: 753725

As usual with complex queries, I use TDQD — Test-Driven Query Design — to solve the problem in stages. Each stage can be tested individually, and the results checked, ensuring that you reach the correct answer.

I note that you show us three tables; your query uses two of those but mentions a fourth, nominees. I am assuming that winners is the same as nominees, since you gave us that schema and asked about who won the award from every institution offering the award in a given year.

Stage 1: How many institutions gave a specific award in a year?

SELECT aid, year, COUNT(*) AS num_awards
  FROM winners
 GROUP BY aid, year;

Stage 2: How many times did a person gain a specific award in a year?

SELECT aid, year, personname, COUNT(*) AS num_person_awards
  FROM winners
 GROUP BY aid, year, personname;

Stage 3: Rows where the two counts are the same?

SELECT n.aid, n.year, w.personname
  FROM (SELECT aid, year, COUNT(*) AS num_awards
          FROM winners
         GROUP BY aid, year
       ) AS n
  JOIN (SELECT aid, year, personname, COUNT(*) AS num_person_awards
          FROM winners
         GROUP BY aid, year, personname
       ) AS w
    ON n.aid = w.aid AND n.year = w.year AND n.num_awards = w.num_person_awards

Stage 4: Replace award ID with award name in result set

SELECT a.name AS awardname, a.year, w.personname
  FROM (SELECT aid, year, COUNT(*) AS num_awards
          FROM winners
         GROUP BY aid, year
       ) AS n
  JOIN (SELECT aid, year, personname, COUNT(*) AS num_person_awards
          FROM winners
         GROUP BY aid, year, personname
       ) AS w
    ON n.aid = w.aid AND n.year = w.year AND n.num_awards = w.num_person_awards
  JOIN awards AS a
    ON a.aid = n.aid;

I've not experimented with whether this is faster than your query, but it looks simpler, so I think it has a reasonable chance of working faster.


This is how I'd format your query:

SELECT nominees.personname AS personname, awards.name AS award, nominees.year AS year
  FROM nominees
  JOIN institutions ON institutions.iid = nominees.iid
  JOIN awards ON awards.aid = nominees.aid
 WHERE personname IS NOT NULL 
 GROUP BY nominees.personname, awards.name, nominees.year 
HAVING (awards.name, COUNT(DISTINCT institutions.name) IN 
            (SELECT awards.name AS awards, COUNT(DISTINCT institutions.name) 
               FROM nominees, awards, institutions 
              WHERE nominees.aid = awards.aid and nominees.iid = institutions.iid 
              GROUP BY awards.name)
 ORDER BY nominees.personname, awards.name;

Upvotes: 3

Related Questions