Reputation: 4797
I have a query optimization problem. Let's say there is a table which has all the invoices. Using a TVP (Table Valued Parameter) I'd like to select few records by providing 1..n ids, or return all records by providing a single id with the value of -1.
DECLARE @InvoiceIdSet AS dbo.TBIGINT;
INSERT INTO @InvoiceIdSet VALUES (1),(2),(3),(4)
--INSERT INTO @InvoiceIdSet VALUES (-1)
SELECT TOP 100
I.Id ,
Number ,
DueDate ,
IssuedDate ,
Amount ,
Test3
FROM dbo.Invoices I
--WHERE EXISTS ( SELECT NULL
-- FROM @InvoiceIdSet
-- WHERE I.Id = ID
-- OR ID = -1 )
--CROSS APPLY @InvoiceIdSet s WHERE i.Id = s.ID OR s.ID = -1
JOIN @InvoiceIdSet S ON S.ID = I.Id OR S.ID=-1
Regardless of which method of selection I use, the query performs quite efficiently, until I start using the OR operator, at which point it starts taking a very long time to return few records, but all records are being returned real fast.
Any pointers and suggestions will be highly appreciated.
The first plan is without OR, the second is with OR.
UPDATE: After fiddling with different options, I have arrived at this solution as the fastest performing, regardless of number of parameters.
First change the UserDefinedTableType to include a primary key index:
CREATE TYPE [dbo].[TBIGINT] AS TABLE(
[ID] [bigint] NOT NULL PRIMARY KEY CLUSTERED
)
The select statement now looks like this:
SELECT TOP 100
I.Id ,
Number ,
DueDate ,
IssuedDate ,
Amount ,
Test3
FROM dbo.Invoices I
WHERE I.ID IN ( SELECT S.ID
FROM @InvoiceIdSet S
WHERE S.ID <> -1
UNION ALL
SELECT S.ID
FROM dbo.Invoices S
WHERE EXISTS ( SELECT NULL
FROM @InvoiceIdSet
WHERE ID = -1 ) )
The plans got much bigger, but performance is almost constant, between few (first plan) and all (second plan) records.
As you can see the plans are now identical and return the required records in less than a second from 1M rows.
I'd love to hear what the community thinks about this solution.
Thanks everyone for your help.
Upvotes: 4
Views: 613
Reputation: 4797
I am going to accept my own answer here:
DECLARE @InvoiceIdSet AS TBIGINT
--INSERT INTO @InvoiceIdSet
--VALUES ( 1 ),
-- ( 2 ),
-- ( 3 ),
-- ( 4 )
INSERT INTO @InvoiceIdSet VALUES ( -1 )
SELECT TOP 100
I.Id ,
Number ,
DueDate ,
IssuedDate ,
Amount ,
Test3
FROM dbo.Invoices I
WHERE I.ID IN ( SELECT S.ID
FROM @InvoiceIdSet S
WHERE NOT EXISTS ( SELECT NULL
FROM @InvoiceIdSet
WHERE ID = -1 )
UNION ALL
SELECT S.ID
FROM dbo.Invoices S
WHERE EXISTS ( SELECT NULL
FROM @InvoiceIdSet
WHERE ID = -1 ) )
It performs well for ALL and SOME scenarios.
Upvotes: 1
Reputation: 973
It looks like you are doing a trick here by supplying a minus 1 as a parameter as a way of getting everything.
I'm assuming this is in a Stored Proc or something, so maybe supply a null as a parameter in this scenario and try the following;
DECLARE @IDparam int
SELECT TOP 100
I.Id ,
Number ,
DueDate ,
IssuedDate ,
Amount ,
Test3
FROM dbo.Invoices I
JOIN @InvoiceIdSet S ON S.ID = I.Id AND COALESCE(@IDparam, I.Id) = I.Id
If @IDParam
is null, it will use I.Id
in the where clause. Might speed it up.
Upvotes: 0
Reputation: 419
If or S.ID=-1
is added, the SQL Server knows that the condition is true for each row; therefore, the query plan will use Scan as in your second plan.
As Martin Smith says in the comment, the SQL Server isn't clever enough in this situation. You need to either have 2 queries (one if -1 is there, other if only some rows are selected). That way, the SQL Server can produce 2 plans and both of them will be optimal for the scenarios they cover.
You could also go with recompilation (but RECOMPILE will be done always, which is a waste of the resources, in general). Or you can construct the query dynamically. Dynamically would mean you'd generate only 2 queries and both of them will be cached, so no recompilation but then caution how it's written so it isn't vulnerable to SQL injection.
Thanks
Upvotes: 1