Darek
Darek

Reputation: 4797

Efficient way to select few or all records from a table

I have a query optimization problem. Let's say there is a table which has all the invoices. Using a TVP (Table Valued Parameter) I'd like to select few records by providing 1..n ids, or return all records by providing a single id with the value of -1.

    DECLARE @InvoiceIdSet AS dbo.TBIGINT;
    INSERT INTO @InvoiceIdSet VALUES (1),(2),(3),(4)
    --INSERT INTO @InvoiceIdSet VALUES (-1)

    SELECT TOP 100
            I.Id ,
            Number ,
            DueDate ,
            IssuedDate ,
            Amount ,
            Test3
    FROM    dbo.Invoices I
    --WHERE   EXISTS ( SELECT NULL
    --                 FROM   @InvoiceIdSet
    --                 WHERE  I.Id = ID
    --                        OR ID = -1 )
    --CROSS APPLY @InvoiceIdSet s WHERE i.Id = s.ID OR s.ID = -1
    JOIN @InvoiceIdSet S ON S.ID = I.Id OR S.ID=-1

Regardless of which method of selection I use, the query performs quite efficiently, until I start using the OR operator, at which point it starts taking a very long time to return few records, but all records are being returned real fast.

Any pointers and suggestions will be highly appreciated.

Without OR

With OR

The first plan is without OR, the second is with OR.

UPDATE: After fiddling with different options, I have arrived at this solution as the fastest performing, regardless of number of parameters.

First change the UserDefinedTableType to include a primary key index:

CREATE TYPE [dbo].[TBIGINT] AS TABLE(
[ID] [bigint] NOT NULL PRIMARY KEY CLUSTERED
)

The select statement now looks like this:

SELECT TOP 100
    I.Id ,
    Number ,
    DueDate ,
    IssuedDate ,
    Amount ,
    Test3
FROM    dbo.Invoices I
WHERE   I.ID IN ( SELECT    S.ID
              FROM      @InvoiceIdSet S
              WHERE     S.ID <> -1
              UNION ALL
              SELECT    S.ID
              FROM      dbo.Invoices S
              WHERE     EXISTS ( SELECT NULL
                                 FROM   @InvoiceIdSet
                                 WHERE  ID = -1 ) )

The plans got much bigger, but performance is almost constant, between few (first plan) and all (second plan) records.

Few Records

All Records

As you can see the plans are now identical and return the required records in less than a second from 1M rows.

I'd love to hear what the community thinks about this solution.

Thanks everyone for your help.

Upvotes: 4

Views: 613

Answers (3)

Darek
Darek

Reputation: 4797

I am going to accept my own answer here:

DECLARE @InvoiceIdSet AS TBIGINT
--INSERT  INTO @InvoiceIdSet
--VALUES  ( 1 ),
--        ( 2 ),
--        ( 3 ),
--        ( 4 )
INSERT  INTO @InvoiceIdSet VALUES  ( -1 )

SELECT TOP 100
        I.Id ,
        Number ,
        DueDate ,
        IssuedDate ,
        Amount ,
        Test3
FROM    dbo.Invoices I
WHERE   I.ID IN ( SELECT    S.ID
                  FROM      @InvoiceIdSet S
                  WHERE     NOT EXISTS ( SELECT NULL
                                         FROM   @InvoiceIdSet
                                         WHERE  ID = -1 )
                  UNION ALL
                  SELECT    S.ID
                  FROM      dbo.Invoices S
                  WHERE     EXISTS ( SELECT NULL
                                     FROM   @InvoiceIdSet
                                     WHERE  ID = -1 ) )

It performs well for ALL and SOME scenarios.

Upvotes: 1

richardb
richardb

Reputation: 973

It looks like you are doing a trick here by supplying a minus 1 as a parameter as a way of getting everything.

I'm assuming this is in a Stored Proc or something, so maybe supply a null as a parameter in this scenario and try the following;

DECLARE @IDparam int

SELECT TOP 100
            I.Id ,
            Number ,
            DueDate ,
            IssuedDate ,
            Amount ,
            Test3
    FROM    dbo.Invoices I
    JOIN @InvoiceIdSet S ON S.ID = I.Id AND COALESCE(@IDparam, I.Id) = I.Id

If @IDParam is null, it will use I.Id in the where clause. Might speed it up.

Upvotes: 0

E.K.
E.K.

Reputation: 419

If or S.ID=-1 is added, the SQL Server knows that the condition is true for each row; therefore, the query plan will use Scan as in your second plan. As Martin Smith says in the comment, the SQL Server isn't clever enough in this situation. You need to either have 2 queries (one if -1 is there, other if only some rows are selected). That way, the SQL Server can produce 2 plans and both of them will be optimal for the scenarios they cover. You could also go with recompilation (but RECOMPILE will be done always, which is a waste of the resources, in general). Or you can construct the query dynamically. Dynamically would mean you'd generate only 2 queries and both of them will be cached, so no recompilation but then caution how it's written so it isn't vulnerable to SQL injection.

Thanks

Upvotes: 1

Related Questions