Reputation: 11

How to optimize a select top N Query

I have a very large table, consisting of 40 million rows, in a SQL Server 2008 Database.

CREATE TABLE [dbo].[myTable](
    [ID] [bigint] NOT NULL,
    [CONTRACT_NUMBER] [varchar](50) NULL,
    [CUSTOMER_NAME] [varchar](200) NULL,
    [INVOICE_NUMBER] [varchar](50) NULL,
    [AGENCY] [varchar](50) NULL,
    [AMOUNT] [varchar](50) NULL,
    [INVOICE_MONTH] [int] NULL,
    [INVOICE_YEAR] [int] NULL,
    [Unique_ID] [bigint] NULL,
    [bar_code] [varchar](50) NOT NULL,
 CONSTRAINT [PK_MyTable] PRIMARY KEY CLUSTERED 
(
    [ID] ASC,
    [bar_code] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

I am trying to optimize performance for the following query:

SELECT top 35  ID,
            CONTRACT_NR,
            CUSTOMER_NAME,
            INVOICE_NUMBER,
            AMOUNT,
            AGENCY,
            CONTRACT_NUMBER,
            ISNULL([INVOICE_MONTH], 1) as [INVOICE_MONTH],
            ISNULL([INVOICE_YEAR], 1) as [INVOICE_YEAR],
            bar_code, 
            Unique_ID
            from MyTable 
WHERE 
CONTRACT_NUMBER like @CONTRACT_NUMBER and
INVOICE_NUMBER like @INVOICE_NUMBER and 
CUSTOMER_NAME like @CUSTOMER_NAME 
ORDER BY Unique_ID desc

In order to do that i build an included index on the columns CONTRACT_NUMBER, INVOICE_NUMBER and CUSTOMER_NAME.

CREATE NONCLUSTERED INDEX [ix_search_columns_without_uniqueid] ON [dbo].[MyTable] 
(
    [CONTRACT_NUMBER] ASC,
    [CUSTOMER_NAME] ASC,
    [INVOICE_NUMBER] ASC
)
INCLUDE ( [ID],
[AGENCY],
[AMOUNT],
[INVOICE_MONTH],
[INVOICE_YEAR],
[Unique_ID],
[Contract_nr],
[bar_code]) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

Still the query is taking from 3 sec to 10 sec to execute. From the query execution plan i see that an index seek operation is taking place consuming about 30% of the total workload and than a Sort (Top N) operation which is consuming the other 70%. Any idea how can i optimize this query, a response time of less than 1 sec is preferred? Note: I tried also to include dhe column [Unique_ID] in the index columns. In this case the query execution plan is doing an index scan, but with many users querying the database, i am having the same problem.

Upvotes: 1

Answers (2)

Elshan

Reputation: 7693

Check this page for more detail.

Update the statistic with a full scan to make the optimizer work easier.

UPDATE STATISTICS tablename WITH fullscan GO

Set statistics time on and execute the following query

SET STATISTICS time ON GO SELECT num_of_reads, num_of_bytes_read, num_of_writes, num_of_bytes_written FROM sys.dm_io_virtual_file_stats(DB_ID('tempdb'), 1) GO SELECT TOP 100 c1, c2,c3 FROM yourtablename WHERE c1<30000 ORDER BY c2 GO SELECT num_of_reads, num_of_bytes_read, num_of_writes, num_of_bytes_written FROM sys.dm_io_virtual_file_stats(DB_ID('tempdb'), 1) GO

Result

CPU time = 124 ms,  elapsed time = 91 ms
Before Query execution 
num_of_reads         num_of_bytes_read    num_of_writes     num_of_bytes_written
-------------------- -------------------- -------------------- --------------------
725864               46824931328          793589               51814416384
After Query execution  
num_of_reads         num_of_bytes_read    num_of_writes        num_of_bytes_written
-------------------- -------------------- -------------------- --------------------
725864               46824931328          793589               51814416384

Source : https://www.mssqltips.com/sqlservertip/2053/trick-to-optimize-top-clause-in-sql-server/

Upvotes: 1

davek

Reputation: 22925

Try and replace your clustered index (currently on two columns) with one solely on unique_id (assuming that it really is unique). This will aid your sorting. Then add a second covering index - as you have tried - on the three columns used in the WHERE. Check your statistics are upto date. Ihave a feeling that the column bar_code in your PK is preventing your sort from running as quickly as it could.

Do your variables contain wildcards?If they do,and they are leading wildcards, the index on the WHERE columns cannot be used. If they are not wildcarded, try a direct "=", assuming case-sensitivity is not an issue.

UPDATE: since you have leading wildcards, you will not be able to take advantage of an index on CONTRACT_NUMBER , INVOICE_NUMBER or CUSTOMER_NAME: as GriGrim suggested, the only alternative here is to use fulltext searches (CONTAINS keyword etc.).

Upvotes: 0

How to optimize a select top N Query

Answers (2)

Related Questions