George Marshall
George Marshall

Reputation: 59

mysql - need two limits?

Given a table containing awards earned over several years by members of an organization that consists of multiple geographic districts, what mysql query would show the top award earner in each district? I can easily get the top ten award earners across all districts with this query:

SELECT Membername,  count(Award)as Number FROM awards 
GROUP BY Membername
ORDER BY Number desc 
LIMIT 10

But I need a list with the top earner for each district (there are about 90 of them), and I haven't gotten it right yet.

I tried this:

SELECT Membername,  District, count(Award)as Number FROM awards 
GROUP BY Membername, District
ORDER BY Number desc, District 
LIMIT 90

It has accurate counts for the members, but isn't showing one per district, so some districts show up more than once. How do I get it to list the top earner per district, with each district showing up just once?

Upvotes: 4

Views: 336

Answers (3)

symcbean
symcbean

Reputation: 48357

There's a page specifically dedicated to the problem - and if you look at the older manuals you'll see the max-concat trick - which is often still more efficient.

Upvotes: 0

DRapp
DRapp

Reputation: 48139

You'll have to do it by applying a "rank" per district, then only grab per rank = 1... The @LastDistrict at the join location is defaulted to zero, in case the district is based on an ID. If district is char based, you can just change it to = "" instead to match the data type.

To clarify what is happening. The "AwardCounts" pre query does the entire query per district and member with however many award counts. Then, ordered by district and member award count (descending), thus putting the highest award count at the first position per district.

That is joined to another bogus alias "SQLVars" which just creates inline variables to the query called @RankSeq and @LastDistrict. So, the first time in, the "DistRankSeq" will become a 1 for the first district, then prime the "@LastDistrict" with the value of the district. The next entry for the same district (since it will be in proper sequence order) will be assigned the rank of 2, then 3, etc... When there is a change from whatever the "LAST" District was to the new record being tested, the rank gets set back to 1 and starts over again. So you could have one district with 100 members, another with 5, another with 17...

So, your final query has all of them with their respective ranks... Now, apply the HAVING the final district rank = 1... Doing this, you could also adjust the having to get the top 3 members per district (for example)...

select
      AwardCounts.District,
      AwardCounts.MemberName,
      AwardCounts.memberAwards,
      @RankSeq := if( @LastDistrict = AwardCounts.District, @RankSeq +1, 1 ) DistRankSeq,
      @LastDistrict := AwardCounts.District as ignoreIt
   from
      ( select 
              a.district,
              a.membername,
              count(*) as memberAwards
           from
              Awards a
           group by
              a.district,
              a.membername
           order by
              a.district,
              memberAwards desc ) AwardCounts

      JOIN (select @RankSeq := 0, @LastDistrict = 0 ) SQLVars
   HAVING
      DistRankSeq = 1

EDIT PER FEEDBACK If its the aggregation thats taking the time, then I would do the following. Create a new table with nothing but the aggregations per district, name and initial rank for the district. As any new record is added to this table, the trigger then adds one to the aggregate table count, then checks where that person is within their district and re-updates its new rank position. You could take it a step further and have another table of just "TOP" member per district table that is one per district with the person's name. When a new person hits the top position, their name is put in the table, overwriting whoever was there last.

Upvotes: 2

Chris Nash
Chris Nash

Reputation: 3051

There's a fairly common way to do this, using self-joins. The trick is to replace a search for 'largest' with a search for 'those items with nothing bigger'. As you have already found out

SELECT Membername,  District, count(Award) as Number FROM awards
GROUP BY Membername, District

returns you a nice result of award counts. Let's write ... to save a bit of space as shorthand for that.

Now consider

SELECT a.Membername, a.District, a.Number FROM (...) a LEFT JOIN (...) b
ON a.District=b.District
AND a.Number<b.Number
WHERE b.Membername IS NULL

where the ... is that stuff written above. It's basically saying, for every entry in the award counts (a), find me all the entries (b) in the same district with more awards, and only return (a) if there aren't any (b)'s... in other words, a is the champ.

You will need to finesse this a bit if there's more than one member in the same district with the same winning count... this query will return all the tied members. You'll have to decide how to handle that. And watch out for those districts that don't have any awards at all... they won't even appear in your table.

Upvotes: 0

Related Questions