Why changing where statement to a variable cause query to be 4 times slower

Question

I am inserting data from one table "Tags" from "Recovery" database into another table "Tags" in "R3" database

they all live in my laptop similar SQL Server instance

I have built the insert query and because Recovery..Tags table is around 180M records I decided to break it into smaller sebsets. ( 1 million recs at the time)

Here is my query (Let's call Query A)

insert into R3..Tags (iID,DT,RepID,Tag,xmiID,iBegin,iEnd,Confidence,Polarity,Uncertainty,Conditional,Generic,HistoryOf,CodingScheme,Code,CUI,TUI,PreferredText,ValueBegin,ValueEnd,Value,Deleted,sKey,RepType)
SELECT T.iID,T.DT,T.RepID,T.Tag,T.xmiID,T.iBegin,T.iEnd,T.Confidence,T.Polarity,T.Uncertainty,T.Conditional,T.Generic,T.HistoryOf,T.CodingScheme,T.Code,T.CUI,T.TUI,T.PreferredText,T.ValueBegin,T.ValueEnd,T.Value,T.Deleted,T.sKey,R.RepType
FROM Recovery..tags  T inner join Recovery..Reps R on T.RepID = R.RepID
where T.iID between 13000001 and 14000000

it takes around 2 minutes.

That is ok

To make things a bit easier for me

I put the iiD in the were statement in a variable

so my query looks like this (Let's call Query B)

declare @i int = 12

insert into R3..Tags (iID,DT,RepID,Tag,xmiID,iBegin,iEnd,Confidence,Polarity,Uncertainty,Conditional,Generic,HistoryOf,CodingScheme,Code,CUI,TUI,PreferredText,ValueBegin,ValueEnd,Value,Deleted,sKey,RepType)
SELECT T.iID,T.DT,T.RepID,T.Tag,T.xmiID,T.iBegin,T.iEnd,T.Confidence,T.Polarity,T.Uncertainty,T.Conditional,T.Generic,T.HistoryOf,T.CodingScheme,T.Code,T.CUI,T.TUI,T.PreferredText,T.ValueBegin,T.ValueEnd,T.Value,T.Deleted,T.sKey,R.RepType
FROM Recovery..tags  T inner join Recovery..Reps R on T.RepID = R.RepID
where T.iID between (1000000 * @i) + 1 and (@i+1)*1000000

but that cause the insert to become so slow (around 10 min)

So what I tried query A again and gave me around 2 min

I tried query B again and gave around 8 min!!

I am attaching exec plan for each one (at a site that shows an analysis of the query plan) - Query A Plan and Query B Plan

Any idea why this is happening? and how to fix it?

seanb · Accepted Answer

The big difference in time is due to the very different plans that are being created to join Tags and Reps.

Fundamentally, in version A, it knows how much data is being extracted (a million rows) and it can design an efficient query for that. However, because you are using variables in B to define how much data is being imported, it has to define a more generic query - one that would work for 10 rows, a million rows, or a hundred million rows.

In the plans, here are the relevant sections of the query joining Tags and Reps...

... in A

... and B

Note that in A it takes just over a minute to do the join; in B it takes 6 and a half minutes.

The key thing that appears to take the time is that it does a table scan of the Tags table which takes 5:44 to complete. The plan has this as a table scan, as the next time you run the query you may want many more than 1 million rows.

A secondary issue is that the amount of data it reads (or expects to read) from Reps is also way out of whack. In A it expected to read 2 million rows and read 1421; in B it basically read them all (even though technically it probably only needed the same 1421).

I think you have two main approaches to fix

Look at indexing, to remove the table scan on Tags - ensure the indexes match what is needed and allows the query to do a scan on that index (it appears that the index at the top of @MikePetri's answer is what you need, or similar). This way instead of doing a table scan, it can do an index scan which can start 'in the middle' of the data set (a table scan must start at either the start or end of the data set).
Separate this into two processes. The first process gets the relevant million rows from Tags, and saves it in a temporary table. The second process uses the data in the temporary table to join to Reps (also try using option (recompile) in the second query, so that it checks the temporary table's size before creating the plan).

You can even put an index or two (and/or Primary Key) on that temporary table to make it better for the next step.

Why changing where statement to a variable cause query to be 4 times slower

Answers (2)

Related Questions