Karthik
Karthik

Reputation: 1074

How to do order by and group by in this scenario?

I need to select TOP 3 Sports based on TotalUsers by DESC and group them by Individual Sports.

What I've done till now is

SELECT *
FROM (
    SELECT R.Sports, R.RoomID ,R.Name,
     COUNT(C.ChatUserLogId) AS TotalUsers,
     ROW_NUMBER()
      OVER (PARTITION BY R.SPORTS ORDER BY R.SPORTS DESC ) AS Rank
    FROM Room R JOIN ChatUserLog C
    ON R.RoomID = C.RoomId
    GROUP BY 
     R.RoomID,
     R.Name,
     R.Sports                
    ) rs WHERE Rank  IN (1, 2, 3)   
ORDER BY Sports, TotalUsers DESC

Below is the output of the SQL

Sports                                             RoomID      Name                                                                                                 TotalUsers  Rank
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aerobics                                           6670        Aerobic vs. Anaerobic Exercise: Which is Best to Burn more Fat?                                      17          1
Aerobics                                           7922        Is it okay to be fat if you’re fit?                                                                  13          2
Aerobics                                           6669        What is the best time of the day to do an aerobic work out?                                          7           3
Archery                                            7924        Who were the best archers in history?                                                                8           1
Archery                                            7925        Should I get into shooting or archery?                                                               7           2
Archery                                            7926        What advantages, if any, do arrows have over bullets?                                                6           3
Badminton                                          6678        Which is more challenging, physically and mentally: badminton or tennis?                             9           1
Badminton                                          6677        Who is your favorite - Lee chong wei or Lin dan?                                                     8           2
Badminton                                          6794        Which single athlete most changed the sport?                                                         7           3
Billiards                                          6691        How to get great at billiards?                                                                       34          1
Billiards                                          6692        Why is Efren Reyes the greatest billiards and pool player of all time?                               31          2
Boxing                                             6697        Mike Tyson: The greatest heavyweight of all time?                                                    13          1
Boxing                                             6700        Who is considered the greatest boxer of all time? Why?                                               13          2
Boxing                                             6699        What is the greatest, most exciting boxing fight of all-time?                                        12          3

But my query does not solve my requirement. I need the output something like below. The below output selects the TotalUsers and groups them by Sports.

Sports      TotalUsers
-----------------------
Billiards   34
Billiards   31
Aerobics    17
Aerobics    13
Aerobics    7
Boxing      13
Boxing      13
Boxing      12

Any help is appreciated.

Upvotes: 1

Views: 124

Answers (3)

drf
drf

Reputation: 8699

Your code looks very close, but there appear to be three issues.

Over clause

There looks to be an error in your OVER clause:

 ROW_NUMBER() OVER(PARTITION BY R.SPORTS ORDER BY R.SPORTS DESC)

The PARTITION BY statement is correct in restarting the ranking for each partition. However, within each partition you are ordering by the partition criteria, which is nondeterministic (R.SPORTS will necessarily be equal for each value in the partition, so ORDER BY will have no effect). What you want, instead, is to order by the total users. The statement is then:

 ROW_NUMBER() OVER(PARTITION BY R.SPORTS ORDER BY COUNT(C.CHATUSERLOGID) DESC)

(You can also use RANK() in place of ROW_NUMBER if you want to rooms with equal number of users to have the same ranking.)

Final query ordering

The question indicates you are seeking to order the result set as follows:

  • First, by sport; sports should be ordered by the largest room within that category
  • Second, the top 3 rooms for each sport in descending order of size

The first criteria requires a new column in your inner select statement: for each room, what was the highest number of users for any room for that sport? This can be written as:

MAX(COUNT(C.CHATUSERLOGID)) OVER (PARTITION BY R.SPORTS) MaxSportsUsers

With this column available, you can order by MaxSportsUsers descending followed by Rank ascending.

Limiting to top 3 sports: a problem arises

The question solution indicates you only want the top three sports, ranked by the number of users in its top room. Thus, you need to do a ranking of the form:

RANK() OVER (PARTITION BY CATEGORY ORDER BY MAX(COUNT(USERID)) OVER (PARTITION BY CATEGORY)) CategoryTop

But SQL Server does not support this, and attempting it will raise the error "Windowed functions cannot be used in the context of another windowed function or aggregate".

There are a few alternatives. As one, note that if we run SELECT TOP 3 SPORT, MAX(TotalUsers) MaxUsers FROM RS ORDER BY 2 DESC against the inner query (rs), the query will produce the top three sports and highest user count. Joining these records against RS on Sport will limit the final output to the top three sports.

This approach requires that RS to be referenced from an inner join. To do so, it's necessary to convert the nested query (SELECT * FROM (SELECT...) rs) to Common Table Expression form (WITH RS AS (SELECT...) SELECT * FROM RS). This allows a query of the form WITH RS AS (SELECT...) SELECT * FROM RS JOIN (SELECT... FROM RS) R2...

Once the query is in CTE format, we can join on the CTE query, i.e., INNER JOIN (SELECT TOP 3 SPORT, MAX(TOTALUSERS) MaxSportsUsers FROM RS GROUP BY SPORT ORDER BY 2 DESC) RS2 ON RS2.SPORT = RS.SPORT), keeping the ORDER BY clause the same. The inner join will limit the final dataset to the top 3 sports.

With the MaxSportsUsers column moved to the inner join, it can be removed from RS (formerly the inner query).

Final query

Combining the above, we get the final query:

WITH RS AS
(
SELECT R.Sports, R.RoomID ,R.Name,
             COUNT(C.ChatUserLogId) AS TotalUsers,
             ROW_NUMBER() OVER (PARTITION BY R.SPORTS ORDER BY COUNT(C.ChatUserLogId) DESC ) AS Rank
      FROM Room R
      JOIN ChatUserLog C ON R.RoomID = C.RoomId
      GROUP BY R.RoomID, R.Name,  R.Sports
)
SELECT rs.Sports, rs.TotalUsers
FROM rs
INNER JOIN (
    SELECT TOP 3 SPORTS, MAX(TOTALUSERS) MaxSportsUsers FROM RS GROUP BY SPORTS ORDER BY 2 DESC
) RS2 ON RS2.SPORTS = RS.SPORTS
WHERE Rank IN (1, 2, 3)   
ORDER BY MaxSportsUsers DESC, RANK;

Upvotes: 1

Dan Bracuk
Dan Bracuk

Reputation: 20794

If you want the top 3, start by getting the top 3. Something like this:

with top3Sports as (
select top 3 sports, count(chatUserLogId) users
from room r join chatUserLog c on r.roomId = c.roomId
group by sports 
order by count(chatUserLogId) desc
)

select the fields you need
from top3Sports join other tables etc

It's a lot simpler than the approach you tried. Bear in mind, however, that no matter what approach you take, ties will mess you up.

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269503

From the description of your desired data, you appear to only want to select two columns from the subquery:

SELECT rs.Sports, rs.TotalUsers
FROM (SELECT R.Sports, R.RoomID ,R.Name,
             COUNT(C.ChatUserLogId) AS TotalUsers,
             ROW_NUMBER() OVER (PARTITION BY R.SPORTS ORDER BY R.SPORTS DESC ) AS Rank
      FROM Room R JOIN
           ChatUserLog C
           ON R.RoomID = C.RoomId
      GROUP BY R.RoomID, R.Name,  R.Sports                
      ) rs
WHERE Rank IN (1, 2, 3)   
ORDER BY Sports, TotalUsers DESC;

The only change is that the outer query selects the two columns you want.

Upvotes: 1

Related Questions