SQL Server Query intermittent performance Issue

Question

Recently we have run into performance issues with a particular query on SQL Server (2016). The problem I'm seeing is that the performance issues are incredibly inconsistent and I'm not sure how to improve this.

The table details:

CREATE TABLE ContactRecord 
(
  ContactSeq BIGINT NOT NULL 
, ApplicationCd VARCHAR(2) NOT NULL 
, StartDt DATETIME2 NOT NULL 
, EndDt DATETIME2 
, EndStateCd VARCHAR(3) 
, UserId VARCHAR(10) 
, UserTypeCd VARCHAR(2) 
, LineId VARCHAR(3) 
, CallingLineId VARCHAR(20) 
, DialledLineId VARCHAR(20) 
, ChannelCd VARCHAR(2) 
, SubChannelCd VARCHAR(2) 
, ServicingAgentCd VARCHAR(7) 
, EucCopyTimestamp VARCHAR(30) 
, PRIMARY KEY (ContactSeq)
, FOREIGN KEY (ApplicationCd) REFERENCES ApplicationType(ApplicationCd)
, FOREIGN KEY (EndStateCd) REFERENCES EndStateType(EndStateCd)
, FOREIGN KEY (UserTypeCd) REFERENCES UserType(UserTypeCd)
)

CREATE TABLE TransactionRecord 
(
  TransactionSeq BIGINT NOT NULL 
, ContactSeq BIGINT NOT NULL 
, TransactionTypeCd VARCHAR(3) NOT NULL 
, TransactionDt DATETIME2 NOT NULL 
, PolicyId VARCHAR(10) 
, ProductId VARCHAR(7) 
, EucCopyTimestamp VARCHAR(30) 
, Detail VARCHAR(1000) 
, PRIMARY KEY (TransactionSeq)
, FOREIGN KEY (ContactSeq) REFERENCES ContactRecord(ContactSeq)
, FOREIGN KEY (TransactionTypeCd) REFERENCES TransactionType(TransactionTypeCd)
)

Current record counts:

ContactRecord 20million
TransactionRecord 90million

My query is:

select
   UserId,
   max(StartDt) as LastLoginDate 
from
   ContactRecord 
where
   ContactSeq in 
   (
      select
         ContactSeq 
      from
         TransactionRecord 
      where
         ContactSeq in 
         (
            select
               ContactSeq 
            from
               ContactRecord 
            where
               UserId in 
               (
                  '1234567890',
                  '1234567891' -- Etc.
               )
         )
         and TransactionRecord.TransactionTypeCd not in 
         (
            '122'
         )
   )
   and ApplicationCd not in 
   (
      '1',
      '4',
      '5'
   )
group by
   UserId;

Now the query isn't great and could be improved using joins, however it does fundamentally work. The problem I'm having is that our data job takes an input of roughly 7100 userIds. These are then broken up into groups of 500. For each 500 these are used in the IN clause in this query. The first 14 executions of this query with 500 items in the IN clause execute fine. Results are returned in roughly 15-20 seconds for each.

The issue is with the remaining 100 give or take for the last execution of this query. It never seems to complete. It just hangs. In our data job it is timing out after 10 minutes. I have no idea why. I'm not an expert with SQL Server so I'm not really sure how to debug this. I have executed each sub query independently and then replaced the contents of the sub query with the returned data. Doing this for each sub query works fine.

Any help is really appreciated here as I'm at a loss to how this works so consistently with larger amounts of parameters but just doesn't work with only a fraction.

EDIT

I've got three example of execution plans here. Please note that each of these were executed on a test server and all executed almost instantly as there is very little data on this test equivalent.

This is the execution plan for 500 arguments which executes fine in production, returning in roughly 15-20 seconds:

This is the execution plan for 119 arguments which is timing out in our data job after 10 minutes:

This is the execution plan for 5 arguments which executes fine. This query is not explicitly being executed in the data job but just for comparison:

In all instances SSMS has given the following warning:

/*
Missing Index Details from SQLQuery2.sql
The Query Processor estimates that implementing the following index could improve the query cost by 26.3459%.
*/

/*
USE [CloasIvr]
GO
CREATE NONCLUSTERED INDEX []
ON [dbo].[TransactionRecord] ([TransactionTypeCd])
INCLUDE ([ContactSeq])
GO
*/

Is this the root cause to this problem?

seanb · Accepted Answer

Without seeing what's going on, it is hard to know exactly what's going on - especially with the ones that are failing. Execution plans for 'good' runs can help a bit but we're just guessing what's going wrong in bad runs.

My initial guess (similar to my comment) is that the estimates of what it expects is very wrong, and it creates a plan that is very bad.

Your TransactionRecord table in particular, with the detail column that is 1000 characters, would could have big issues with an unexpected large number of nested loops.

Indexes

The first thing I would suggest is indexing - particularly to a) only include a subset of the data you need for these, and b) to have them ordered in a useful manner.

I suggest the following two indexes would appear to help

CREATE INDEX IX_ContactRecord_User ON ContactRecord 
    (UserId, ContactSeq) 
    INCLUDE (ApplicationCD, Startdt);

CREATE INDEX IX_TransactionRecord_ContactSeq ON TransactionRecord 
    (ContactSeq, TransactionTypeCd);

These are both 'covering indexes', as well as being sorted in ways that can help. Alternatively, you could replace the first one with a slightly modified version (sorting first on ContactSeq) but I think the above version would be more useful.

CREATE INDEX IX_ContactRecord_User2 ON ContactRecord 
    (ContactSeq) 
    INCLUDE (ApplicationCD, Startdt, UserId);

Also, regarding the index on TransactionRecord - if this is the only query that would be using that index, you could improve it by creating the following index instead

CREATE INDEX IX_TransactionRecord_ContactSeq_Filtered ON TransactionRecord
    (ContactSeq, TransactionTypeCd) 
    WHERE (TransactionTypeCD <> '122');

The above is a filtered index that matches what's specified in the WHERE clause of your statement. The big thing about this is that it has already a) removed the records where the type <> '122', and b) has sorted the records already on ContactSeq so it's then easy to look them up.

By the way - given you asked about adding indexes on Foreign Keys on principle - the use of these really depends on how you read the data. If you are only ever referring to the referenced table (e.g., you have an FK to a status table, and only ever use it to report, in English, the statuses) then an index on the original table's Status_ID wouldn't help. On the other hand, if you want to find all the rows with Status_ID = 4, then it would help.

To help understanding indexes, I strongly recommend Brent Ozar's How to think like an SQL Server Engine - it really helped me to understand how indexes work in practice.

Use a sorted temp table

This may help but is unlikely to be the primary fix. If you pre-load the relevant UserIDs into a temporary table (with a primary key on UserID) then it may help with the relevant JOIN. It may also be easier for you to modify each run rather than have to modify the middle of the query.

CREATE TABLE #Users (UserId VARCHAR(10) PRIMARY KEY);
INSERT INTO #Users (UserID) VALUES
('1234567890'),
('1234567891');

Then replace the middle section of your query with

      where
         ContactSeq in 
         (
            select
               ContactSeq 
            from
               ContactRecord CR
               INNER JOIN #Users U ON CR.UserID = U.UserID 
         )
         and TransactionRecord.TransactionTypeCd not in 
         (
            '122'
         )

Simplify the query

I had a go at simplifying the query, and got it to this:

select  CR.UserId,
        max(CR.StartDt) as LastLoginDate 
from    ContactRecord CR
        INNER JOIN TransactionRecord TR ON CR.ContactSeq = TR.ContactSeq
where   TR.TransactionTypeCd not in ('122')
        AND CR.ApplicationCd not in ('1', '4', '5')
        AND CR.UserId in ('1234567890', '1234567891') -- etc
group by UserId;

or alternatively (with the temp table)

select  CR.UserId,
        max(CR.StartDt) as LastLoginDate 
from    ContactRecord CR
        INNER JOIN #Users U ON CR.UserID = U.UserID 
        INNER JOIN TransactionRecord TR ON CR.ContactSeq = TR.ContactSeq
where   TR.TransactionTypeCd not in ('122')
        AND CR.ApplicationCd not in ('1', '4', '5')
group by UserId;

One advantage of simplifying the query, is that it also helps SQL Server get good estimates; which in turn help it get good execution plans.

Of course, you would need to test that the above returns exactly the same records in your circumstances - I don't have a data set to test on, so I cannot be 100% sure these simplified versions match the original.

SQL Server Query intermittent performance Issue

Answers (1)

Related Questions