Mattasse
Mattasse

Reputation: 1161

How to get the max between several value group per year

I am working on the stack overflow database and I would like to have the most popular tag per year. The output that I want is :

2016 Java 123456
2015 PHP  123456
2014 Java 123456
...  ...  ...

I have make this query :

SELECT tagName, annee, max(nbApparitions) AS nbApparitions
FROM  
    (SELECT tagName,
     year(creationDate) AS annee,
     count(tagName) AS nbApparitions
     FROM tags
     JOIN postTags 
     ON (tags.Id = postTags.tagId)
     JOIN posts
     ON (postTags.postId = posts.Id)
     GROUP BY tagName, year(creationDate))Apparition
 GROUP BY tagName , annee

But this query return all tags for each year. I don't know how to make this.

Upvotes: 1

Views: 40

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269973

You can use row_number():

SELECT tagname, annee, cnt
FROM (SELECT t.tagName, year(creationDate) AS annee,
             count(*) AS cnt,
             ROW_NUMBER() OVER (PARTITION BY year(creationDate) ORDER BY COUNT(*) DESC) AS seqnum
      FROM tags t JOIN
           postTags pt
           ON t.Id = pt.tagId JOIN
           posts p
           ON pt.postId = p.Id
      GROUP BY t.tagName, year(creationDate)
     ) pt
WHERE seqnum = 1;

In statistics, the most common element is called the mode. The above returns exactly one mode for each year. If there are ties, then an arbitrary one is chosen.

If you want all modes when there are ties, then use rank() or dense_rank() instead of row_number().

In English, "apparition" means "ghost", so the query is a bit funny. That's why I changed the column name to cnt.

Upvotes: 1

Related Questions