csmba
csmba

Reputation: 4083

select top 1000, but know how many rows are there?

SQL Server 2005

I have 10 million rows in DB, and run a select (with lots of "where" and joints.. pretty complex). The results are presented in grid (think goolge results) and because of that, the user cannot possibly use more then 1000 results.

So I limit my SQL with a TOP 1000.

Problem: User still wants to know that there are 5432 results for his search.

Can I get that info without paying the price (or in other words, with still getting the speed benefits the "top 1000" is giving me ?)

Assumptions - Assume that the TOP 1000 brings 1000 out of 100K rows. So even the network price of moving 100K might be an issue.

Conclusions There is no free lunch! you can get the elegant way (accepted answer) but it still takes as long as the more expensive operation (i.e. counting all results). In real life, I will go with the 2 SQL approach, one to return top 1000 rows for display, and one that is ASYNC and updates some AJAX panel with the count(*) results that will take much much longer to computer

Upvotes: 3

Views: 7446

Answers (6)

Dave Markle
Dave Markle

Reputation: 97861

SELECT TOP 1000 x, y, z, COUNT(*) OVER () AS TotalCount
FROM dbo.table

Upvotes: 10

jon skulski
jon skulski

Reputation: 2305

This won't answer your questions, but I think this a good case where we implement the:

  • Hardware is cheap
  • Programmers are not cheap

Rule.

Yes doing two queries (one count, one select *) is not an optimal solution (you could do it in one query!). But how sure are you that this is going to be a bottleneck?

I get stuck on these questions and I have to remember this myself to continue developing.

Write it fast, write it better, PROFILE, then write it best.

Or, maybe this is the bottleneck and thats why you're asking. But it seems to me that if it was your bottleneck, a better optimization would to be restructure the database so that your query itself is simplified.

Upvotes: -2

Tom H
Tom H

Reputation: 47402

Since you're using SQL Server 2005 you get to use a CTE for this kind of query. Here's what I'm currently doing for a client:

;WITH Search_Results AS
(
     SELECT TOP(@system_max_rows)
          my_column1,
          my_column2,
          ROW_NUMBER() OVER
          (
               ORDER BY
                    -- Your order criteria here
          ) AS row_num,
          COUNT(my_column1) OVER (PARTITION BY '') As total_count
     FROM
          My_Table
     -- Put any joins here
     WHERE
          -- Put WHERE criteria here
)
SELECT
     my_column1,
     my_column2,
     row_num,
     total_count
FROM
     Search_Results
WHERE
     ((row_num - 1)/@rows_per_page) + 1 = CASE
                WHEN ((total_count - 1)/@rows_per_page) + 1 < @page_number THEN ((total_count - 1)/@rows_per_page) + 1
                ELSE @page_number
           END
OPTION (RECOMPILE)

The recompile is there because the search criteria can change drastically between calls to the stored procedure making a cached query plan bad. Hopefully the parameters/variables are obvious. This was for a paged search solution. The system max rows variable is hard-coded so that even the application can't override the maximum number of rows that can be returned and crash the server. To get the top 1000 you would pass in @page_number = 1 and @rows_per_page = 1000.

Upvotes: 0

BradC
BradC

Reputation: 39986

You may be over-estimating the performance benefit of the "top 1000", especially if there would only be ~5000 total results.

The server already has to do all the joins and stuff (which is typically the hard part), then has to order the result set the way you specified, THEN finally takes the 1000 first results.

Two options here:

1) Do one query with a Count(*) to get the count of results, then do a second query with your top 1000, retrieving the appropriate columns (as Neil suggests). OR
2) Retrieve all rows the first time, cache them in a result set, then only display 1000 rows to the user.

It might sound like the first would be faster, but the second only has to hit the database once, and depending on the specifics of your database and query, might be better (as long as the database isn't going to return 100,000 rows!)

Upvotes: 0

Filip Ekberg
Filip Ekberg

Reputation: 36327

You want to use "count" and group by, check out this reference: http://msdn.microsoft.com/en-us/library/ms175997.aspx

Also your problem seems to be posted here: http://www.eggheadcafe.com/software/aspnet/32427870/select-top-n-plus-a-count.aspx

Good Luck

Upvotes: 3

Neil Barnwell
Neil Barnwell

Reputation: 42165

Personally I'd opt for two statements hitting the database. One to retrieve the count, one to retrieve the first 1000 records.

You could run both queries in a batch to squeeze a little extra performance by saving a round-trip to the database.

-- Get the count
select count(*) from table where [criteria]

-- Get the data
select [cols] from table where [criteria]

Upvotes: 5

Related Questions