Reputation: 1161
I am working on the stack overflow database and I would like to have the most popular tag per year. The output that I want is :
2016 Java 123456
2015 PHP 123456
2014 Java 123456
... ... ...
I have make this query :
SELECT tagName, annee, max(nbApparitions) AS nbApparitions
FROM
(SELECT tagName,
year(creationDate) AS annee,
count(tagName) AS nbApparitions
FROM tags
JOIN postTags
ON (tags.Id = postTags.tagId)
JOIN posts
ON (postTags.postId = posts.Id)
GROUP BY tagName, year(creationDate))Apparition
GROUP BY tagName , annee
But this query return all tags for each year. I don't know how to make this.
Upvotes: 1
Views: 40
Reputation: 1269973
You can use row_number()
:
SELECT tagname, annee, cnt
FROM (SELECT t.tagName, year(creationDate) AS annee,
count(*) AS cnt,
ROW_NUMBER() OVER (PARTITION BY year(creationDate) ORDER BY COUNT(*) DESC) AS seqnum
FROM tags t JOIN
postTags pt
ON t.Id = pt.tagId JOIN
posts p
ON pt.postId = p.Id
GROUP BY t.tagName, year(creationDate)
) pt
WHERE seqnum = 1;
In statistics, the most common element is called the mode. The above returns exactly one mode for each year. If there are ties, then an arbitrary one is chosen.
If you want all modes when there are ties, then use rank()
or dense_rank()
instead of row_number()
.
In English, "apparition" means "ghost", so the query is a bit funny. That's why I changed the column name to cnt
.
Upvotes: 1