Reputation: 2213
I am a software developer and I was recently approached by DBA to optimize the query that an app of mine is using. DBA reported that query takes about 50% of CPU and high I/O operations when it runs. The query is pretty straight forward and I am unsure how to optimize it.
Question 1: How can I optimize this query?
Question 2: is it even my job to do so, shouldn't DBA be more knowledgeable in this? Mind you we have no DB developers, just DBA and Software Developers.
DB has approximately 30-50 million of records, it is constantly maintained/monitored by DBA, but I am unsure how. Server is on a dedicated machine and is Microsoft SQL Server 2005 - 9.00.5057.00 (X64)
PS: Please do not provide ways to improve DB by structural changes, I know it's a bad design to have currency stored as varchar, but it is what it is, we can't change DB structure, only queries accessing it.
Thank you for any insight.
Query:
SELECT
COALESCE(CAST([PH].[PAmount] AS decimal(15, 2)) + CAST([PH].[Fee] AS decimal(15, 2)), 0.0) AS [PayAmount],
[PH].[PDate] AS [PayDate]
FROM [History] AS [PH] WITH (NOLOCK)
WHERE [PH].[PMode] IN ('C', 'P')
AND [PH].[INNO] = 'XYZ'
AND [PH].[PStatus] IN ('CONSERVED', 'EXPECTING', 'REFRIGERATED', 'POSTPONED', 'FILED')
AND [PH].[Locked] = 1
AND [PH].[PDate] >= 'Jan 1, 2015'
ORDER BY [PH].[PDate] ASC
Fields:
PAmount
- non-clustered index, varchar(50)
Fee
- not indexed, decimal(6,2)
PDate
- clustered index, datetime
PMode
- non-clustered index, varchar(5)
INNO
- non-clustered index, varchar(50)
PStatus
- non-clustered index, varchar(50)
Locked
- not indexed, bit
Execution plan:
SELECT---Compute Scalar---Filter---NestedLoops-|--Index Seek
(Inner Join) |
cost 0% Cost 0% Cost 0% Cost 0% | cost 4%
|---Key Lookup
Cost 96%
Upvotes: 4
Views: 383
Reputation: 492
I would see if I got better results with ISNULL instead of COALESCE.
The other thing is looking at the indexes. You listed the fields that are indexed. If those fields are covered by several indexes, I suggest making one good covering index for this query.
A covering index is one where all of the data needed by the query are contained in the index. If an index used by the query is not covering, then there is an extra trip (or trips) to the table to get the rest of the fields. It is more efficient if all of the data is right there in the query.
Check out these articles:
What are Covering Indexes and Covered Queries in SQL Server?
For the data that is not part of a join or in the where clause, you can use the include keyword. Included fields are not searchable parts of the index, but will save the trip to the database.
Try the the index below. All of the fields in the where clause are part of the searchable part of the index, and all of the returned fields that are not part of the where clause are included. You might need to play with the order after looking at the execution plan, but I took my best guess.
Create Nonclustered Index Ix_Ncl_History_CoveringBigSelect on History(PDate, PMode, INNO, PStatus, Locked) Include (PAmount, Fee)
Here is an article about included columns.
Upvotes: 1
Reputation: 416179
It seems like you have a misconception about indexes. Indexes don't combine with each other, so it's not a question of having a column "indexed" or "not indexed". It's not good to have a separate index for individual columns. It's about having indexes with several columns that much up with individual queries. An index on a column won't help a query if it's still more efficient for the database to select on another column first.
I'm getting a little stale at this, but for this query I'd recommend an index that looks something like this:
CREATE NONCLUSTERED INDEX [ix_History_XXXXX] ON [History]
(
[INNO] ASC,
[Locked] ASC,
[PDate] ASC,
[PMode] ASC
)
INCLUDE ( PStatus, PAmount, Fee)
You may want to swap around PDate, PMode, and PStatus, depending on their selectivity.
When building an index, you want to list the most specific items first. The general idea is that an index stores each successive item in order. With this index, rows for all of the XYZ
values for INNO
will be grouped together, and so the query engine can seek right to that section of the index. The next most specific column is Locked
. Even though this is a bit
value, because it is limited to exactly one value we are still able to seek directly to the one specific part of the index that will matter for the entire query. Again: I haven't had to do this kind of thing for a while, so you might do as well listed PMode
here; I just don't recall whether the Sql Server query optimizer is smart enough to handle the two values in an efficient way.
From here on out the best option for the index depends on how much each of the query values limits the results. Since we're no longer able to get all of the results into one space, we're gonna have to scan the relevant parts of the index. My instinct here is to use the Date
value next. This will allow the scan to walk the index starting with the first date that matches your result, and help it get the records in the correct order, but again: this is just my instinct. You may be able to do better by listing PMode or PStatus first.
Finally, the additional in the INCLUDES
clause will allow you to entirely complete this query from the index, without actually going back to the full table. You use an INCLUDES clause rather than just appending the values to the query to avoid making Sql Server rebuild the index for updates to these columns. This is why PStatus, for example, probably should not be part of the main index, if the status is something that can change, and why you might be better off also leaving Locked
out of the index. These are things you'll want to measure and test for yourself, though.
Upvotes: 3
Reputation: 539
As you said, I assume you can do nothing to the db, include indexing and structural changes. So what about the client App environment, is it powerful enough to do client side calculation?
If answer is yes, I suggest to move the calculation to the client side:
Upvotes: 0
Reputation: 14097
I would simply create index on the following table:
CREATE NONCLUSTERED INDEX idx_History_Locked_PMode_INNO_PStatus_PDate_iPAmount_iFee
ON dbo.History (Locked, PMode, INNO, PStatus, PDate)
INCLUDE (PAmount, Fee)
WHERE Locked = 1; -- This is optional, can reduce index size.
This should improve your current query. All conditions should be met here.
Upvotes: 1
Reputation: 538
You are right, the query looks normal. It's a straight forward query, only with 'AND' clause and no "NOT NULL" constraint or joins or subselect. The conditions are mostly equal (only the date is relational). If the values in the conditions (like 'C', 'P', 1, 'XYZ', 'CONSERVED', etc.) are selective enough, than you (or the DBA) should define some indexes and the optimizer can use it. Ask the DBA to create an appropriate index for the table.
How many result lines are you expected to get? If there are a lot (e.g. >> 10,000), the ORDER BY clause could cost a lot.
Upvotes: 0