Reputation: 3
here is a picture of the schema: https://i.sstatic.net/yr26g.png
I am trying to create a query to find the authors with the highest number of publications to ICDE, who did not publish in SIGMOD.
I think I am on the right track with getting the number of authors with highest publications to ICDE:
select AUTHOR.NAME, aid, count(PUBLICATION.id)
from PUBLICATION, AUTHORPUBLICATION, CONFERENCE, AUTHOR
where aid=AUTHOR.id and pid=PUBLICATION.id and cid=CONFERENCE.id and CONFERENCE.name='ICDE'
group by AUTHOR.name, aid order by count(PUBLICATION.id) desc;
but have no idea how to form the query so that it excludes authors that have published in SIGMOD
among the things I have tried:
with aut(n,a,i) as
((select AUTHOR.NAME, aid, count(PUBLICATION.id) as cnt
from PUBLICATION, AUTHORPUBLICATION, CONFERENCE, AUTHOR
where aid=AUTHOR.id and pid=PUBLICATION.id and cid=CONFERENCE.id and CONFERENCE.name='ICDE'
group by AUTHOR.NAME, aid)
minus
(select AUTHOR.NAME, aid, count(PUBLICATION.id) as cnt
from PUBLICATION, AUTHORPUBLICATION, CONFERENCE, AUTHOR
where aid=AUTHOR.id and pid=PUBLICATION.id and cid=CONFERENCE.id and CONFERENCE.name='SIGMOD'
group by AUTHOR.NAME, aid))
select n,a,i from aut order by i desc;
but it doesn't seem to be working at all. I have a list of
author, id, paper_id
or
list of author, id, count(paper)
(I've tried it both ways) and because of the paper IDs I think I have a disjoint set and cannot get this to work. Have spent 3 hours already :(
I also tried something getting rid of the paper ids:
with aut(n,a) as
((select AUTHOR.NAME, aid
from PUBLICATION, AUTHORPUBLICATION, CONFERENCE, AUTHOR
where aid=AUTHOR.id and pid=PUBLICATION.id and cid=CONFERENCE.id and CONFERENCE.name='ICDE'
order by count(PUBLICATION.id) desc)
minus
(select AUTHOR.NAME, aid
from PUBLICATION, AUTHORPUBLICATION, CONFERENCE, AUTHOR
where aid=AUTHOR.id and pid=PUBLICATION.id and cid=CONFERENCE.id and CONFERENCE.name='SIGMOD'
order by count(PUBLICATION.id) desc))
select n,a from aut;
but it says i'm missing a parenthesis? I'm not.
Upvotes: 0
Views: 134
Reputation: 13248
I think this would be much better written using conditional aggregation:
select au.name,
au.id,
sum(decode(co.name, 'ICDE', 1, 0)) as num_publications
from publication pu
join authorpublication ap
on pu.id = ap.pid
join author au
on au.id = ap.aid
join conference co
on co.id = pu.cid
where co.name in ('ICDE', 'SIGMOD')
group by au.name, au.id
having sum(decode(co.name, 'SIGMOD', 1, 0)) = 0
order by num_publications desc
This filters in on both conferences, but counts only ICDE in the SELECT list, and filters out any authors with publication(s) at the SIGMOD conference via the HAVING clause.
As another option, while I believe it would be less efficient, you could also outer join w/ an inline view that selects the authors to be excluded, and then filter in on where the result is null (no match with the inline view):
select au.name, au.id, count(pu.id) as num_publications
from publication pu
join authorpublication ap
on pu.id = ap.pid
join author au
on au.id = ap.aid
join conference co
on co.id = pu.cid
left join (select au.id
from publication pu
join authorpublication ap
on pu.id = ap.pid
join author au
on au.id = ap.aid
join conference co
on co.id = pu.cid
where co.name = 'SIGMOD') x
on x.id = au.id
where co.name = 'ICDE'
and x.id is null
group by au.name, au.id
order by num_publications desc
Upvotes: 1
Reputation: 20804
Your second attempt was close. Try something like this;
select etc
from etc
where author_id in
(select the author id's you want
minus
select the author id's you don't want)
Upvotes: 0