sql query for merging two tables and then applying count

Question

I have two data sets paper_data and paper_author

paper_author:

paper_id author_id
    1    521630
    1    972575
    1   1528710
    2    521630
    2   1682088
    3   1682088

paper_data:

paper_id paper_year
    1       2009
    2       2007
    3       1963
    4       2005
    5       1997

I want to find the no of distinct years for which authors have written paper, that is for example:

author_id   paper_id  paper_year    distinct_paper_year_count  
521630        1,2      2009,2007        2
972575         1        2009            1 
1528710        1        2009            1  
1682088       2,3      2007,1963        2

So I want the final result as:

author_id    distinct_paper_year_count
521630             2
972575             1
1528710            1
1682088            2

I am able to get to:

author_id paper_year
521630      2009
972575      2009
.....

by running a simple query:

statement<-"select paper_author.author_id,paper_data.paper_year 
       from paper_author,paper_data
       where paper_author.paper_id=paper_data.paper_id"

But then I am stuck.How could this be done?

Thanks

user330315 · Accepted Answer

This should do it:

select paper_author.author_id,
       count(distinct paper_data.paper_year) as distinct_paper_year_count
from paper_author
  join paper_data on paper_author.paper_id = paper_data.paper_id
group by paper_author.author_id

Note that I replaced the outdated implicit join in the where clause by an explicit JOIN condition which is preferred over the implicit one.

sql query for merging two tables and then applying count

Answers (2)

Related Questions