jbcedge
jbcedge

Reputation: 19515

Select top 10 records for each category

I want to return top 10 records from each section in one query. Can anyone help with how to do it? Section is one of the columns in the table.

Database is SQL Server 2005. I want to return the top 10 by date entered. Sections are business, local, and feature. For one particular date I want only the top (10) business rows (most recent entry), the top (10) local rows, and the top (10) features.

Upvotes: 295

Views: 698089

Answers (16)

Anssssss
Anssssss

Reputation: 3262

This is a slight refinement (syntax only, swapping out a CASE statement for a GREATEST function) to Vadim Loboda's answer above.

DECLARE @N INT = 3; --the top "N" items to retrieve
WITH items(dt, category) AS
(
          SELECT '2000-01-01', 'group1' 
    UNION SELECT '2000-01-02', 'group1' 
    UNION SELECT '2000-01-03', 'group1' 
    UNION SELECT '2000-01-04', 'group1' 
    UNION SELECT '2000-01-05', 'group1' 
    UNION SELECT '1999-01-01', 'group2' 
    UNION SELECT '1999-01-02', 'group2' 
    UNION SELECT '1999-01-03', 'group2'
    UNION SELECT '1999-01-04', 'group2'
    UNION SELECT '2000-01-01', 'group3' 
)
SELECT TOP 1 WITH TIES 
    category, 
    dt
FROM items
ORDER BY GREATEST(0, ROW_NUMBER() OVER (PARTITION BY category ORDER BY dt ASC) - @N)

ROW_NUMBER is applied to each (partitioned by whatever we are grouping by), then subtract N from it so that it acquires a negative value for the first items, then the GREATEST function will instead return zero. All the rows with zero will then "tie" and TOP WITH TIES will give you just those rows.

Upvotes: 0

dkellner
dkellner

Reputation: 9986

Note: I know the OP has only 3 groups but this is a known general problem for many developers and there's no really good solution in SQL. So let me show you another way.


IN THEORY:

You can write it as one query. That is, formally one query, but it contans either subqueries or a self-JOIN which makes it in fact multiple operations under the hood. So you might as well just select each group individually.


IN PRACTICE:

If you want a performant solution, you need to work a bit more. Let's say you have 100 employees, you have 26 buildings from A to Z, people move around them (enter/leave), and you need the last 5 events for every building.

EVENT_ID   EVENT_TIME            EMPOYEE_ID   EVENT_CODE   BUILDING                           
-------------------------------------------------------------------------                                
883691     2023-03-29 11:00:00   92           enter        A                                           
883690     2023-03-29 11:00:21   78           enter        C                                           
883689     2023-03-29 11:00:25   58           enter        A                                           
883688     2023-03-29 11:02:10   22           leave        H                                           
883687     2023-03-29 11:31:42   73           leave        P                                           
...
...

You want to avoid 26 queries.

Here's what you can do:

  1. Write a query with a simple ORDER BY EVENT_ID DESC (or EVENT_TIME DESC), to get the last N events for all buildings.
  2. Set N (the limit) to a reasonable estimate that will have data for most buildings, you don't need to have all of them but the more the better. Let's say LIMIT 5000.
  3. Process the result set on the application side, to see which buildings don't have the top 10 in it.
  4. For those buildings, run separate queries to get their top 10.

For theorists, this is an anti-pattern. But the first query will have almost the same performance as one of the single-building ones, and probably brings you most of what you need; a few buildings will be missing, depending on how the employees normally move. Then, you may need 5 more queries for those buildings, and some merging on the application level.

In short: get a result that's almost complete, then make it complete.

So if you need performance, this is one way. If you need clarity of the business logic, well, choose any other answer. This one is scary. But when you go for speed, you often need scary techniques.

Upvotes: 1

lorond
lorond

Reputation: 3896

SELECT r.*
FROM
(
    SELECT
        r.*,
        ROW_NUMBER() OVER(PARTITION BY r.[SectionID]
                          ORDER BY r.[DateEntered] DESC) rn
    FROM [Records] r
) r
WHERE r.rn <= 10
ORDER BY r.[DateEntered] DESC

Upvotes: 48

Vadim Loboda
Vadim Loboda

Reputation: 3111

If we use SQL Server >= 2005, then we can solve the task with one select only:

declare @t table (
    Id      int ,
    Section int,
    Moment  date
);

insert into @t values
(   1   ,   1   , '2014-01-01'),
(   2   ,   1   , '2014-01-02'),
(   3   ,   1   , '2014-01-03'),
(   4   ,   1   , '2014-01-04'),
(   5   ,   1   , '2014-01-05'),

(   6   ,   2   , '2014-02-06'),
(   7   ,   2   , '2014-02-07'),
(   8   ,   2   , '2014-02-08'),
(   9   ,   2   , '2014-02-09'),
(   10  ,   2   , '2014-02-10'),

(   11  ,   3   , '2014-03-11'),
(   12  ,   3   , '2014-03-12'),
(   13  ,   3   , '2014-03-13'),
(   14  ,   3   , '2014-03-14'),
(   15  ,   3   , '2014-03-15');


-- TWO earliest records in each Section

select top 1 with ties
    Id, Section, Moment 
from
    @t
order by 
    case 
        when row_number() over(partition by Section order by Moment) <= 2 
        then 0 
        else 1 
    end;


-- THREE earliest records in each Section

select top 1 with ties
    Id, Section, Moment 
from
    @t
order by 
    case 
        when row_number() over(partition by Section order by Moment) <= 3 
        then 0 
        else 1 
    end;


-- three LATEST records in each Section

select top 1 with ties
    Id, Section, Moment 
from
    @t
order by 
    case 
        when row_number() over(partition by Section order by Moment desc) <= 3 
        then 0 
        else 1 
    end;

Upvotes: 19

Lukas Eder
Lukas Eder

Reputation: 221370

While the question was about SQL Server 2005, most people have moved on and if they do find this question, what could be the preferred answer in other situations is one using CROSS APPLY as illustrated in this blog post.

SELECT *
FROM t
CROSS APPLY (
  SELECT TOP 10 u.*
  FROM u
  WHERE u.t_id = t.t_id
  ORDER BY u.something DESC
) u

This query involves 2 tables. The OP's query only involves 1 table, in case of which a window function based solution might be more efficient.

Upvotes: 5

Raghu S
Raghu S

Reputation: 127

Tried the following and it worked with ties too.

SELECT rs.Field1,rs.Field2 
FROM (
    SELECT Field1,Field2, ROW_NUMBER() 
      OVER (Partition BY Section
            ORDER BY RankCriteria DESC ) AS Rank
    FROM table
    ) rs WHERE Rank <= 10

Upvotes: 9

Ali
Ali

Reputation: 343

You can try this approach. This query returns 10 most populated cities for each country.

   SELECT city, country, population
   FROM
   (SELECT city, country, population, 
   @country_rank := IF(@current_country = country, @country_rank + 1, 1) AS country_rank,
   @current_country := country 
   FROM cities
   ORDER BY country, population DESC
   ) ranked
   WHERE country_rank <= 10;

Upvotes: 1

Darrel Miller
Darrel Miller

Reputation: 142252

If you are using SQL 2005 you can do something like this...

SELECT rs.Field1,rs.Field2 
    FROM (
        SELECT Field1,Field2, Rank() 
          over (Partition BY Section
                ORDER BY RankCriteria DESC ) AS Rank
        FROM table
        ) rs WHERE Rank <= 10

If your RankCriteria has ties then you may return more than 10 rows and Matt's solution may be better for you.

Upvotes: 278

bharathreddy
bharathreddy

Reputation: 61

Q) Finding TOP X records from each group(Oracle)

SQL> select * from emp e 
  2  where e.empno in (select d.empno from emp d 
  3  where d.deptno=e.deptno and rownum<3)
  4  order by deptno
  5  ;

 EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO

  7782 CLARK      MANAGER         7839 09-JUN-81       2450                    10
  7839 KING       PRESIDENT            17-NOV-81       5000                    10
  7369 SMITH      CLERK           7902 17-DEC-80        800                    20
  7566 JONES      MANAGER         7839 02-APR-81       2975                    20
  7499 ALLEN      SALESMAN        7698 20-FEB-81       1600        300         30
  7521 WARD       SALESMAN        7698 22-FEB-81       1250        500         30

6 rows selected.


Upvotes: 6

Craig Tullis
Craig Tullis

Reputation: 10517

If you want to produce output grouped by section, displaying only the top n records from each section something like this:

SECTION     SUBSECTION

deer        American Elk/Wapiti
deer        Chinese Water Deer
dog         Cocker Spaniel
dog         German Shephard
horse       Appaloosa
horse       Morgan

...then the following should work pretty generically with all SQL databases. If you want the top 10, just change the 2 to a 10 toward the end of the query.

select
    x1.section
    , x1.subsection
from example x1
where
    (
    select count(*)
    from example x2
    where x2.section = x1.section
    and x2.subsection <= x1.subsection
    ) <= 2
order by section, subsection;

To set up:

create table example ( id int, section varchar(25), subsection varchar(25) );

insert into example select 0, 'dog', 'Labrador Retriever';
insert into example select 1, 'deer', 'Whitetail';
insert into example select 2, 'horse', 'Morgan';
insert into example select 3, 'horse', 'Tarpan';
insert into example select 4, 'deer', 'Row';
insert into example select 5, 'horse', 'Appaloosa';
insert into example select 6, 'dog', 'German Shephard';
insert into example select 7, 'horse', 'Thoroughbred';
insert into example select 8, 'dog', 'Mutt';
insert into example select 9, 'horse', 'Welara Pony';
insert into example select 10, 'dog', 'Cocker Spaniel';
insert into example select 11, 'deer', 'American Elk/Wapiti';
insert into example select 12, 'horse', 'Shetland Pony';
insert into example select 13, 'deer', 'Chinese Water Deer';
insert into example select 14, 'deer', 'Fallow';

Upvotes: 6

Phil Rabbitt
Phil Rabbitt

Reputation: 1291

In T-SQL, I would do:

WITH TOPTEN AS (
    SELECT *, ROW_NUMBER() 
    over (
        PARTITION BY [group_by_field] 
        order by [prioritise_field]
    ) AS RowNo 
    FROM [table_name]
)
SELECT * FROM TOPTEN WHERE RowNo <= 10

Upvotes: 129

Diadistis
Diadistis

Reputation: 12174

I know this thread is a little bit old but I've just bumped into a similar problem (select the newest article from each category) and this is the solution I came up with :

WITH [TopCategoryArticles] AS (
    SELECT 
        [ArticleID],
        ROW_NUMBER() OVER (
            PARTITION BY [ArticleCategoryID]
            ORDER BY [ArticleDate] DESC
        ) AS [Order]
    FROM [dbo].[Articles]
)
SELECT [Articles].* 
FROM 
    [TopCategoryArticles] LEFT JOIN 
    [dbo].[Articles] ON
        [TopCategoryArticles].[ArticleID] = [Articles].[ArticleID]
WHERE [TopCategoryArticles].[Order] = 1

This is very similar to Darrel's solution but overcomes the RANK problem that might return more rows than intended.

Upvotes: 9

Bill Karwin
Bill Karwin

Reputation: 562931

I do it this way:

SELECT a.* FROM articles AS a
  LEFT JOIN articles AS a2 
    ON a.section = a2.section AND a.article_date <= a2.article_date
GROUP BY a.article_id
HAVING COUNT(*) <= 10;

update: This example of GROUP BY works in MySQL and SQLite only, because those databases are more permissive than standard SQL regarding GROUP BY. Most SQL implementations require that all columns in the select-list that aren't part of an aggregate expression are also in the GROUP BY.

Upvotes: 19

Blorgbeard
Blorgbeard

Reputation: 103565

If you know what the sections are, you can do:

select top 10 * from table where section=1
union
select top 10 * from table where section=2
union
select top 10 * from table where section=3

Upvotes: 15

Matt Hamilton
Matt Hamilton

Reputation: 204269

This works on SQL Server 2005 (edited to reflect your clarification):

select *
from Things t
where t.ThingID in (
    select top 10 ThingID
    from Things tt
    where tt.Section = t.Section and tt.ThingDate = @Date
    order by tt.DateEntered desc
    )
    and t.ThingDate = @Date
order by Section, DateEntered desc

Upvotes: 36

sblundy
sblundy

Reputation: 61434

Might the UNION operator work for you? Have one SELECT for each section, then UNION them together. Guess it would only work for a fixed number of sections though.

Upvotes: 4

Related Questions