Jonas
Jonas

Reputation: 128827

How to count two different columns in T-SQL?

I am playing with the StackOverflow datadump. Now I have a T-SQL Problem:

I can select a list with the number of questions per month and year with:

select datepart(year, posts.creationdate) as year,
datepart(month, posts.creationdate) as month, 
count(distinct posts.id) as questions
from posts
inner join posttags on posttags.postid = posts.id
inner join tags on tags.id = posttags.tagid
where posts.posttypeid = 1
group by datepart(month, posts.creationdate), 
datepart(year, posts.creationdate)
order by datepart(year, posts.creationdate), 
datepart(month, posts.creationdate)

If I add and tags.tagname = 'scala' on the WHERE-row, then I get the number of all "scala-questions". Is there any way I can show both the total number of questions and the number of questions containing a specific tag in the same result set (in different columns).

Because when I add the and tags.tagname = 'scala' I can no longer see the total number of questions per month.

Any ideas on how I can unit these resultsets into one?

Upvotes: 1

Views: 1658

Answers (2)

Mikael Eriksson
Mikael Eriksson

Reputation: 138960

If you use left outer join against posttags, count(posttags.tagid) will only count non null values. And since the left outer join only contain scala tags, you can skip the distinct in count(distinct posts.id).

select datepart(year, posts.creationdate) as year,
       datepart(month, posts.creationdate) as month,
       count(*) as questions,
       count(posttags.tagid) as sc
from posts
  left outer join posttags
    on posttags.postid = posts.id and
       posttags.tagid = (select id
                         from tags
                         where tagname = 'scala')
where posts.posttypeid = 1
group by datepart(month, posts.creationdate),
         datepart(year, posts.creationdate)
order by datepart(year, posts.creationdate),
         datepart(month, posts.creationdate)

Try here: https://data.stackexchange.com/stackoverflow/q/107948/

Upvotes: 2

ig0774
ig0774

Reputation: 41267

You'd need two queries to do that, since you have two sets of data (questions by month and scala questions by month). One possible solution is using common table expressions to create two "temporary views" of the data. As an example:

with total as (
    select datepart(year, posts.creationdate) as year,
           datepart(month, posts.creationdate) as month, 
           count(distinct posts.id) as questions
    from posts
        inner join posttags on posttags.postid = posts.id
        inner join tags on tags.id = posttags.tagid
    where posts.posttypeid = 1
    group by datepart(month, posts.creationdate), datepart(year, posts.creationdate)
), scala as (
    select datepart(year, posts.creationdate) as year,
           datepart(month, posts.creationdate) as month, 
           count(distinct posts.id) as questions
    from posts
        inner join posttags on posttags.postid = posts.id
        inner join tags on tags.id = posttags.tagid
     where posts.posttypeid = 1 and tags.tagname = 'scala'
    group by datepart(month, posts.creationdate), datepart(year, posts.creationdate)
)
select total.year, total.month, total.questions as total_questions, scala.questions as scala_questions
from total
    join scala on total.year = scala.year and total.month = scala.month
order by total.year, total.month​

The results of which can be seen here.

Upvotes: 2

Related Questions