Reputation: 4083
SQL Server 2005
I have 10 million rows in DB, and run a select (with lots of "where" and joints.. pretty complex). The results are presented in grid (think goolge results) and because of that, the user cannot possibly use more then 1000 results.
So I limit my SQL with a TOP 1000.
Problem: User still wants to know that there are 5432 results for his search.
Can I get that info without paying the price (or in other words, with still getting the speed benefits the "top 1000" is giving me ?)
Assumptions - Assume that the TOP 1000 brings 1000 out of 100K rows. So even the network price of moving 100K might be an issue.
Conclusions There is no free lunch! you can get the elegant way (accepted answer) but it still takes as long as the more expensive operation (i.e. counting all results). In real life, I will go with the 2 SQL approach, one to return top 1000 rows for display, and one that is ASYNC and updates some AJAX panel with the count(*) results that will take much much longer to computer
Upvotes: 3
Views: 7446
Reputation: 97861
SELECT TOP 1000 x, y, z, COUNT(*) OVER () AS TotalCount
FROM dbo.table
Upvotes: 10
Reputation: 2305
This won't answer your questions, but I think this a good case where we implement the:
Rule.
Yes doing two queries (one count, one select *) is not an optimal solution (you could do it in one query!). But how sure are you that this is going to be a bottleneck?
I get stuck on these questions and I have to remember this myself to continue developing.
Write it fast, write it better, PROFILE, then write it best.
Or, maybe this is the bottleneck and thats why you're asking. But it seems to me that if it was your bottleneck, a better optimization would to be restructure the database so that your query itself is simplified.
Upvotes: -2
Reputation: 47402
Since you're using SQL Server 2005 you get to use a CTE for this kind of query. Here's what I'm currently doing for a client:
;WITH Search_Results AS
(
SELECT TOP(@system_max_rows)
my_column1,
my_column2,
ROW_NUMBER() OVER
(
ORDER BY
-- Your order criteria here
) AS row_num,
COUNT(my_column1) OVER (PARTITION BY '') As total_count
FROM
My_Table
-- Put any joins here
WHERE
-- Put WHERE criteria here
)
SELECT
my_column1,
my_column2,
row_num,
total_count
FROM
Search_Results
WHERE
((row_num - 1)/@rows_per_page) + 1 = CASE
WHEN ((total_count - 1)/@rows_per_page) + 1 < @page_number THEN ((total_count - 1)/@rows_per_page) + 1
ELSE @page_number
END
OPTION (RECOMPILE)
The recompile is there because the search criteria can change drastically between calls to the stored procedure making a cached query plan bad. Hopefully the parameters/variables are obvious. This was for a paged search solution. The system max rows variable is hard-coded so that even the application can't override the maximum number of rows that can be returned and crash the server. To get the top 1000 you would pass in @page_number = 1 and @rows_per_page = 1000.
Upvotes: 0
Reputation: 39986
You may be over-estimating the performance benefit of the "top 1000", especially if there would only be ~5000 total results.
The server already has to do all the joins and stuff (which is typically the hard part), then has to order the result set the way you specified, THEN finally takes the 1000 first results.
Two options here:
1) Do one query with a Count(*) to get the count of results, then do a second query with your top 1000, retrieving the appropriate columns (as Neil suggests). OR
2) Retrieve all rows the first time, cache them in a result set, then only display 1000 rows to the user.
It might sound like the first would be faster, but the second only has to hit the database once, and depending on the specifics of your database and query, might be better (as long as the database isn't going to return 100,000 rows!)
Upvotes: 0
Reputation: 36327
You want to use "count" and group by, check out this reference: http://msdn.microsoft.com/en-us/library/ms175997.aspx
Also your problem seems to be posted here: http://www.eggheadcafe.com/software/aspnet/32427870/select-top-n-plus-a-count.aspx
Good Luck
Upvotes: 3
Reputation: 42165
Personally I'd opt for two statements hitting the database. One to retrieve the count, one to retrieve the first 1000 records.
You could run both queries in a batch to squeeze a little extra performance by saving a round-trip to the database.
-- Get the count
select count(*) from table where [criteria]
-- Get the data
select [cols] from table where [criteria]
Upvotes: 5