user149691
user149691

Reputation: 607

T-SQL. What is the best approach to implement "any" where statement

Imagine you have a table of orders with statuses:

Id (PK)
Status (INT) 
... 

There is user interface (UI) where business user may select what status for orders he is interested in. There is a magic number "-1" that means - user is interested in every status

What is the best way and considerations how to implement the query:

1. select * from orders where status = @user-selected-status-id or @user-selected-status-id = -1

2. select * from orders where status = ISNULL(NULLIF(@user-selected-status-id, -1), status) 

3. Dynamic sql approach with string concatenation "where" operator is joined only if required (EXEC sp_executesql at the very end) 

I used "Execution plan" for all three approaches and it tells that estimated cost is the very best for dynamic sql, then isnull/nullif and "or" is outsider.

I used to think that "or" approach should outperform because of it's brievity isnull/nullif approach

What amused me even more - that results (estimations) depend on size of order table. When it's grew ten times (from 10K -> 100K rows) outsider was "dynamic sql".

May I trust this?

Upvotes: 2

Views: 73

Answers (1)

Thom A
Thom A

Reputation: 95659

Gail Shaw did a couple of great articles on this: Catch-all Queries and Revisting Catch-all Queries.

The summary of these, however, is that is you are on 2008- then go with parametrised dynamic statements. If you are on SQL Server 2012+ then go with the WHERE Column = @Variable OR @Variable IS NULL and add OPTION (RECOMPILE) to the end.

If, however, you do have an expensive query in terms of generating the query plan, then you might want to still use dynamic SQL. Then, each of the statements will have their own cached plan, provided they are called often enough.

What ever though, don't go with status = ISNULL(NULLIF(@user-selected-status-id, -1), status). That clause is non-SARGable and will have the worst perfomance of the bunch.

So, the 2012+ method (for a non high complexity query plan in terms of generation) would be:

SELECT {Columns}
FROM YourTable YT
WHERE (YT.Column = @Variable
   OR  @Variable IS NULL)
OPTION (RECOMPILE);

OPTION (RECOMPILE) is important, as it causes the data engine to recreate the query plan (and estimates) each time the query is run. If you have a query where @Variable has a value, then the number of rows that is going to be returned is going to be estimated as "low" (respectively). On the other hand, for a query where the value of @Variable is NULL then that means that every row is going to be returned (in this simple query); or a "high" number of rows. SQL Server caches plans, which means that a plan for when there is a low number would not be good for when there is a high number. Bad estimates can (and do) mean poor performance as things can overspill from RAM in tempdb, and resource time isn't what was expected. Forcing the data engine to recreate the plan means those estimates will be recreated for each run (which does come at a cost), but will be far better for the query being run; provided your statistics are up to date too.

If, however, you're on SQL Server 2008, you don't have access to OPTION (RECOMPILE), so you'll need to go down the dynamic route:

DECLARE @SQL nvarchar(MAX);

SET @SQL = N'SELECT {Columns}' + NCHAR(13) + NCHAR(10) +
           N'FROM YourTable' + NCHAR(13) + NCHAR(10) +
           CASE WHEN @Variable IS NOT NULL THEN N'WHERE YT.Column = @Variable' ELSE '' END + N';';

EXEC sp_executesql @SQL, N'@Variable {DataType}', @Variable = @Variable;

Upvotes: 3

Related Questions