user3171906
user3171906

Reputation: 583

sql query for merging two tables and then applying count

I have two data sets paper_data and paper_author

paper_author:

paper_id author_id
    1    521630
    1    972575
    1   1528710
    2    521630
    2   1682088
    3   1682088 

paper_data:

paper_id paper_year
    1       2009
    2       2007
    3       1963
    4       2005
    5       1997

I want to find the no of distinct years for which authors have written paper, that is for example:

author_id   paper_id  paper_year    distinct_paper_year_count  
521630        1,2      2009,2007        2
972575         1        2009            1 
1528710        1        2009            1  
1682088       2,3      2007,1963        2

So I want the final result as:

author_id    distinct_paper_year_count
521630             2
972575             1
1528710            1
1682088            2

I am able to get to:

author_id paper_year
521630      2009
972575      2009
.....

by running a simple query:

statement<-"select paper_author.author_id,paper_data.paper_year 
       from paper_author,paper_data
       where paper_author.paper_id=paper_data.paper_id"

But then I am stuck.How could this be done?

Thanks

Upvotes: 1

Views: 41

Answers (2)

Assuming your left table is paper_author, you need to do a left join with the paper_data table in order to get the intended result. Also, you should use the 'select' query along with the 'count' function using the 'distinct' keyword to get only the distinct count of paper_year. Finally, you must use the 'group by' clause to group your result according to the author_id in paper_author.

So here goes the query:

select pa.author_id, count(distinct pd.paper_year)
from dbo.paper_author as pa
join dbo.paper_data as pd
on pa.paper_id = pd.paper_id
group by pa.author_id

You can check the following sqlfiddle link to verify the result: http://sqlfiddle.com/#!3/e5d6e/1

Upvotes: 0

user330315
user330315

Reputation:

This should do it:

select paper_author.author_id,
       count(distinct paper_data.paper_year) as distinct_paper_year_count
from paper_author
  join paper_data on paper_author.paper_id = paper_data.paper_id
group by paper_author.author_id

Note that I replaced the outdated implicit join in the where clause by an explicit JOIN condition which is preferred over the implicit one.

Upvotes: 1

Related Questions