George
George

Reputation: 2213

Optimize SQL query, TSQL

I am a software developer and I was recently approached by DBA to optimize the query that an app of mine is using. DBA reported that query takes about 50% of CPU and high I/O operations when it runs. The query is pretty straight forward and I am unsure how to optimize it.

Question 1: How can I optimize this query?

Question 2: is it even my job to do so, shouldn't DBA be more knowledgeable in this? Mind you we have no DB developers, just DBA and Software Developers.

DB has approximately 30-50 million of records, it is constantly maintained/monitored by DBA, but I am unsure how. Server is on a dedicated machine and is Microsoft SQL Server 2005 - 9.00.5057.00 (X64)

PS: Please do not provide ways to improve DB by structural changes, I know it's a bad design to have currency stored as varchar, but it is what it is, we can't change DB structure, only queries accessing it.

Thank you for any insight.

Query:

SELECT
    COALESCE(CAST([PH].[PAmount] AS decimal(15, 2)) + CAST([PH].[Fee] AS decimal(15, 2)), 0.0) AS [PayAmount],
    [PH].[PDate] AS [PayDate]
FROM [History] AS [PH] WITH (NOLOCK)
WHERE [PH].[PMode] IN ('C', 'P')
    AND [PH].[INNO] = 'XYZ'
    AND [PH].[PStatus] IN ('CONSERVED', 'EXPECTING', 'REFRIGERATED', 'POSTPONED', 'FILED')
    AND [PH].[Locked] = 1
    AND [PH].[PDate] >= 'Jan 1, 2015'
ORDER BY [PH].[PDate] ASC

Fields:

PAmount - non-clustered index, varchar(50)

Fee - not indexed, decimal(6,2)

PDate - clustered index, datetime

PMode - non-clustered index, varchar(5)

INNO - non-clustered index, varchar(50)

PStatus - non-clustered index, varchar(50)

Locked - not indexed, bit

Execution plan: SELECT---Compute Scalar---Filter---NestedLoops-|--Index Seek (Inner Join) | cost 0% Cost 0% Cost 0% Cost 0% | cost 4% |---Key Lookup Cost 96%

Upvotes: 4

Views: 383

Answers (5)

Sean
Sean

Reputation: 492

I would see if I got better results with ISNULL instead of COALESCE.

The other thing is looking at the indexes. You listed the fields that are indexed. If those fields are covered by several indexes, I suggest making one good covering index for this query.

A covering index is one where all of the data needed by the query are contained in the index. If an index used by the query is not covering, then there is an extra trip (or trips) to the table to get the rest of the fields. It is more efficient if all of the data is right there in the query.

Check out these articles:

What are Covering Indexes and Covered Queries in SQL Server?

https://www.simple-talk.com/sql/learn-sql-server/using-covering-indexes-to-improve-query-performance/

For the data that is not part of a join or in the where clause, you can use the include keyword. Included fields are not searchable parts of the index, but will save the trip to the database.

Try the the index below. All of the fields in the where clause are part of the searchable part of the index, and all of the returned fields that are not part of the where clause are included. You might need to play with the order after looking at the execution plan, but I took my best guess.

Create Nonclustered Index Ix_Ncl_History_CoveringBigSelect on History(PDate, PMode, INNO, PStatus, Locked) Include (PAmount, Fee)

Here is an article about included columns.

Upvotes: 1

Joel Coehoorn
Joel Coehoorn

Reputation: 416179

It seems like you have a misconception about indexes. Indexes don't combine with each other, so it's not a question of having a column "indexed" or "not indexed". It's not good to have a separate index for individual columns. It's about having indexes with several columns that much up with individual queries. An index on a column won't help a query if it's still more efficient for the database to select on another column first.

I'm getting a little stale at this, but for this query I'd recommend an index that looks something like this:

CREATE NONCLUSTERED INDEX [ix_History_XXXXX] ON [History] 
(
    [INNO] ASC,
    [Locked] ASC,
    [PDate] ASC,
    [PMode] ASC
)
INCLUDE ( PStatus, PAmount, Fee)

You may want to swap around PDate, PMode, and PStatus, depending on their selectivity.

When building an index, you want to list the most specific items first. The general idea is that an index stores each successive item in order. With this index, rows for all of the XYZ values for INNO will be grouped together, and so the query engine can seek right to that section of the index. The next most specific column is Locked. Even though this is a bit value, because it is limited to exactly one value we are still able to seek directly to the one specific part of the index that will matter for the entire query. Again: I haven't had to do this kind of thing for a while, so you might do as well listed PMode here; I just don't recall whether the Sql Server query optimizer is smart enough to handle the two values in an efficient way.

From here on out the best option for the index depends on how much each of the query values limits the results. Since we're no longer able to get all of the results into one space, we're gonna have to scan the relevant parts of the index. My instinct here is to use the Date value next. This will allow the scan to walk the index starting with the first date that matches your result, and help it get the records in the correct order, but again: this is just my instinct. You may be able to do better by listing PMode or PStatus first.

Finally, the additional in the INCLUDES clause will allow you to entirely complete this query from the index, without actually going back to the full table. You use an INCLUDES clause rather than just appending the values to the query to avoid making Sql Server rebuild the index for updates to these columns. This is why PStatus, for example, probably should not be part of the main index, if the status is something that can change, and why you might be better off also leaving Locked out of the index. These are things you'll want to measure and test for yourself, though.

Upvotes: 3

cqi
cqi

Reputation: 539

As you said, I assume you can do nothing to the db, include indexing and structural changes. So what about the client App environment, is it powerful enough to do client side calculation?

If answer is yes, I suggest to move the calculation to the client side:

  • Do not cast data type in the query, cast varchar to decimal consume CPU resources. So get the result directly and do the convert job in your app.
  • For the IO issue, try to remove IN condition because IN essentially is a "OR" condition. So split your query into small pieces use "=" condition and send to your app, use your client app to "Union" them.

Upvotes: 0

Evaldas Buinauskas
Evaldas Buinauskas

Reputation: 14097

I would simply create index on the following table:

CREATE NONCLUSTERED INDEX idx_History_Locked_PMode_INNO_PStatus_PDate_iPAmount_iFee
    ON dbo.History (Locked, PMode, INNO, PStatus, PDate)
    INCLUDE (PAmount, Fee)
WHERE Locked = 1;                -- This is optional, can reduce index size.

This should improve your current query. All conditions should be met here.

Upvotes: 1

dev.null
dev.null

Reputation: 538

You are right, the query looks normal. It's a straight forward query, only with 'AND' clause and no "NOT NULL" constraint or joins or subselect. The conditions are mostly equal (only the date is relational). If the values in the conditions (like 'C', 'P', 1, 'XYZ', 'CONSERVED', etc.) are selective enough, than you (or the DBA) should define some indexes and the optimizer can use it. Ask the DBA to create an appropriate index for the table.

How many result lines are you expected to get? If there are a lot (e.g. >> 10,000), the ORDER BY clause could cost a lot.

Upvotes: 0

Related Questions