Reputation: 583
I have two data sets paper_data and paper_author
paper_author:
paper_id author_id
1 521630
1 972575
1 1528710
2 521630
2 1682088
3 1682088
paper_data:
paper_id paper_year
1 2009
2 2007
3 1963
4 2005
5 1997
I want to find the no of distinct years for which authors have written paper, that is for example:
author_id paper_id paper_year distinct_paper_year_count
521630 1,2 2009,2007 2
972575 1 2009 1
1528710 1 2009 1
1682088 2,3 2007,1963 2
So I want the final result as:
author_id distinct_paper_year_count
521630 2
972575 1
1528710 1
1682088 2
I am able to get to:
author_id paper_year
521630 2009
972575 2009
.....
by running a simple query:
statement<-"select paper_author.author_id,paper_data.paper_year
from paper_author,paper_data
where paper_author.paper_id=paper_data.paper_id"
But then I am stuck.How could this be done?
Thanks
Upvotes: 1
Views: 41
Reputation: 1316
Assuming your left table is paper_author, you need to do a left join with the paper_data table in order to get the intended result. Also, you should use the 'select' query along with the 'count' function using the 'distinct' keyword to get only the distinct count of paper_year. Finally, you must use the 'group by' clause to group your result according to the author_id in paper_author.
So here goes the query:
select pa.author_id, count(distinct pd.paper_year)
from dbo.paper_author as pa
join dbo.paper_data as pd
on pa.paper_id = pd.paper_id
group by pa.author_id
You can check the following sqlfiddle link to verify the result: http://sqlfiddle.com/#!3/e5d6e/1
Upvotes: 0
Reputation:
This should do it:
select paper_author.author_id,
count(distinct paper_data.paper_year) as distinct_paper_year_count
from paper_author
join paper_data on paper_author.paper_id = paper_data.paper_id
group by paper_author.author_id
Note that I replaced the outdated implicit join in the where clause by an explicit JOIN
condition which is preferred over the implicit one.
Upvotes: 1