George Rockbraker
George Rockbraker

Reputation: 3

How to use minus or where not exists

here is a picture of the schema: https://i.sstatic.net/yr26g.png

I am trying to create a query to find the authors with the highest number of publications to ICDE, who did not publish in SIGMOD.

I think I am on the right track with getting the number of authors with highest publications to ICDE:

select AUTHOR.NAME, aid, count(PUBLICATION.id)
from PUBLICATION, AUTHORPUBLICATION, CONFERENCE, AUTHOR
where aid=AUTHOR.id and pid=PUBLICATION.id and cid=CONFERENCE.id and CONFERENCE.name='ICDE'
group by AUTHOR.name, aid order by count(PUBLICATION.id) desc;

but have no idea how to form the query so that it excludes authors that have published in SIGMOD

among the things I have tried:

with aut(n,a,i) as
((select AUTHOR.NAME, aid, count(PUBLICATION.id) as cnt
from PUBLICATION, AUTHORPUBLICATION, CONFERENCE, AUTHOR
where aid=AUTHOR.id and pid=PUBLICATION.id and cid=CONFERENCE.id and CONFERENCE.name='ICDE'
group by AUTHOR.NAME, aid)
minus
(select AUTHOR.NAME, aid, count(PUBLICATION.id) as cnt
from PUBLICATION, AUTHORPUBLICATION, CONFERENCE, AUTHOR
where aid=AUTHOR.id and pid=PUBLICATION.id and cid=CONFERENCE.id and CONFERENCE.name='SIGMOD'
group by AUTHOR.NAME, aid))
select n,a,i from aut order by i desc;

but it doesn't seem to be working at all. I have a list of author, id, paper_id or list of author, id, count(paper) (I've tried it both ways) and because of the paper IDs I think I have a disjoint set and cannot get this to work. Have spent 3 hours already :(

I also tried something getting rid of the paper ids:

with aut(n,a) as
((select AUTHOR.NAME, aid
from PUBLICATION, AUTHORPUBLICATION, CONFERENCE, AUTHOR
where aid=AUTHOR.id and pid=PUBLICATION.id and cid=CONFERENCE.id and CONFERENCE.name='ICDE'
order by count(PUBLICATION.id) desc)
minus
(select AUTHOR.NAME, aid
from PUBLICATION, AUTHORPUBLICATION, CONFERENCE, AUTHOR
where aid=AUTHOR.id and pid=PUBLICATION.id and cid=CONFERENCE.id and CONFERENCE.name='SIGMOD'
order by count(PUBLICATION.id) desc))
select n,a from aut;

but it says i'm missing a parenthesis? I'm not.

Upvotes: 0

Views: 134

Answers (2)

Brian DeMilia
Brian DeMilia

Reputation: 13248

I think this would be much better written using conditional aggregation:

select au.name,
       au.id,
       sum(decode(co.name, 'ICDE', 1, 0)) as num_publications
  from publication pu
  join authorpublication ap
    on pu.id = ap.pid
  join author au
    on au.id = ap.aid
  join conference co
    on co.id = pu.cid
 where co.name in ('ICDE', 'SIGMOD')
 group by au.name, au.id
having sum(decode(co.name, 'SIGMOD', 1, 0)) = 0
 order by num_publications desc

This filters in on both conferences, but counts only ICDE in the SELECT list, and filters out any authors with publication(s) at the SIGMOD conference via the HAVING clause.

As another option, while I believe it would be less efficient, you could also outer join w/ an inline view that selects the authors to be excluded, and then filter in on where the result is null (no match with the inline view):

select au.name, au.id, count(pu.id) as num_publications
  from publication pu
  join authorpublication ap
    on pu.id = ap.pid
  join author au
    on au.id = ap.aid
  join conference co
    on co.id = pu.cid
  left join (select au.id
               from publication pu
               join authorpublication ap
                 on pu.id = ap.pid
               join author au
                 on au.id = ap.aid
               join conference co
                 on co.id = pu.cid
              where co.name = 'SIGMOD') x
    on x.id = au.id
 where co.name = 'ICDE'
   and x.id is null
 group by au.name, au.id
 order by num_publications desc

Upvotes: 1

Dan Bracuk
Dan Bracuk

Reputation: 20804

Your second attempt was close. Try something like this;

select etc
from etc
where author_id in 
(select the author id's you want
minus
select the author id's you don't want)

Upvotes: 0

Related Questions