clweeks
clweeks

Reputation: 875

Can I improve this query's performance or get the same result through a more performant process?

I wrote a fairly simple stored procedure that accepts some data, selects a value, inserts a couple records and then returns that value. But execution is too long in our production environment where I might eventually want it to run a few hundred thousand times in a day and it's adversely affecting other processes even when we're only running it say 30000 times.

I started by looking at the queries and adding an index on the date field that's used in where clauses. Then I ran SQL Server Profiler -- feeding the results into Tuning Advisor and implementing the indexing suggestions that it came up with. In the past, I've seen that tool call for really ugly indexes but this time it just wanted a single addition which made sense, so I added it. Each of those steps helped. But still too slow.

It was easy to figure out that the first query was the holdup, not the two inserts where are nearly instantaneous. So here's what I had at that time, including subquery run-times:

--all combined, this typically takes in the range of 1200-1500 ms but occasionally spikes up to ~2200 ms
select coalesce(
    (   
        --when the following is run, this takes ~690 ms
        select MIN(maxes.imbsn) 
        from (
            --when the following is run without the higher limiting scopes, this takes ~3600 ms
            select imbsn, MAX(assignmentDate) maxAD 
            from imbsnAssignments 
            group by imbsn
            ) maxes
        where datediff(d, maxes.maxAD, GETDATE()) > 90
    )
    ,
    (
        --this is rarely executed but takes ~0 ms when it is
        select max(imbsn)+1 from imbsnAssignments
    )
)

Based on those times, it seemed like the coalesce was mucking things up (this is a thing that I imagine I could verify with the execution plan if I had ever figured out how to read it, but I haven't -- plans remain largely opaque to my poor brain). To get rid of the coalesce, I changed up the query to:

--this runs ~480-700 ms
select MIN(maxes.imbsn) 
from (
    select imbsn, MAX(assignmentDate) maxAD 
    from imbsnAssignments 
    group by imbsn
        union
    select max(imbsn)+1, getDate() 
    from imbsnAssignments
    ) maxes
where datediff(d, maxes.maxAD, GETDATE()) > 90

which is a big improvement. But it's still pretty slow.

I verified that Profiler-Tuning Advisor still didn't want me to make any changes before coming here to ask you all about it. Which I think leaves me with two approaches: 1) maintain the basic algorithm but squeeze greater efficiency out of it or 2) switch to some smarter way of obtaining the same basic effect through methods that I don't know of but one will be obvious to one of you big-brains who sniff out that I'm engaged in some kind of anti-pattern here.

Thanks in advance for your time and attention!


I'm not precisely sure what the expected format for this additional information is, but I'll give it a try. The table is:

CREATE TABLE [dbo].[imbsnAssignments](
    [id] [int] IDENTITY(1,1) NOT NULL,
    [imbsn] [int] NOT NULL,
    [assignmentDate] [date] NOT NULL,
    [jobCode] [varchar](10) NOT NULL,
    [name] [varchar](45) NOT NULL,
    [a1] [varchar](45) NOT NULL,
    [a2] [varchar](45) NOT NULL,
    [a3] [varchar](45) NOT NULL,
    [a4] [varchar](45) NOT NULL,
    [city] [varchar](40) NOT NULL,
    [state] [char](10) NOT NULL,
    [zip] [varchar](10) NOT NULL,
    [batchIdent] [varchar](256) NOT NULL,
 CONSTRAINT [PK_imbsnAssignments] PRIMARY KEY CLUSTERED 
(
    [id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

and I suspect this isn't the 'right' way to show the indexes but:

 TableName            IndexName                                          IndexType

 imbsnAssignments     PK_imbsnAssignments                                CLUSTERED
 imbsnAssignments     IX_imbsnAssignments_assignmentDate                 NONCLUSTERED
 imbsnAssignments     _dta_index_imbsnAssignments_36_149575571__K2_3     NONCLUSTERED

Execution plan:

  |--Stream Aggregate(DEFINE:([Expr1013]=MIN([partialagg1018])))
       |--Concatenation
            |--Stream Aggregate(DEFINE:([partialagg1018]=MIN([IMB].[dbo].[imbsnAssignments].[imbsn])))
            |    |--Filter(WHERE:([Expr1003]<dateadd(day,(-90),getdate())))
            |         |--Stream Aggregate(GROUP BY:([IMB].[dbo].[imbsnAssignments].[imbsn]) DEFINE:([Expr1003]=MAX([IMB].[dbo].[imbsnAssignments].[assignmentDate])))
            |              |--Index Scan(OBJECT:([IMB].[dbo].[imbsnAssignments].[_dta_index_imbsnAssignments_36_149575571__K2_3]), ORDERED FORWARD)
            |--Stream Aggregate(DEFINE:([partialagg1018]=MIN([Expr1009])))
                 |--Compute Scalar(DEFINE:([Expr1009]=[Expr1008]+(1)))
                      |--Stream Aggregate(DEFINE:([Expr1008]=MAX([IMB].[dbo].[imbsnAssignments].[imbsn])))
                           |--Top(TOP EXPRESSION:((1)))
                                |--Index Scan(OBJECT:([IMB].[dbo].[imbsnAssignments].[_dta_index_imbsnAssignments_36_149575571__K2_3]), ORDERED BACKWARD)

Upvotes: 0

Views: 75

Answers (1)

Jason Carter
Jason Carter

Reputation: 925

First I would suggest taking your where clause closer to the actual table in your second example, and I would check the execution plan to see what its doing, maybe calculate what 90 days back is check it, rather than DATEDIFF every single maxAD value. And just lets change the union to a union all so it doesn't have to check for uniqueness between the datasets.

select MIN(maxes.imbsn) 
from (
    select imbsn, MAX(assignmentDate) maxAD 
    from imbsnAssignments 
    group by imbsn
    having max(assignmentdate) < dateadd(-90, d, GETDATE())
        union all
    select max(imbsn)+1, getDate() 
    from imbsnAssignments
) maxes

Upvotes: 1

Related Questions