Reputation: 6033
Consider the following query:
begin
;with
t1 as (
select top(10) x from tableX
),
t2 as (
select * from t1
),
t3 as (
select * from t1
)
-- --------------------------
select *
from t2
join t3 on t3.x=t2.x
end
go
I was wondering if t1
is called twice hence tableX being called twice (which means t1
acts like a table)?
or just once with its rows saved in t1 for the whole query (like a variable in a programming lang)?
Just trying to figure out how tsql engine optimises this. This is important to know because if t1
has millions of rows and is being called many times in the whole query generating the same result then there should be a better way to do it..
Upvotes: 1
Views: 259
Reputation: 6033
Here is a simple example to test the theories: First, via temporary table which calls the matter only once.
declare @r1 table (id int, v uniqueidentifier);
insert into @r1
SELECT * FROM
(
select id=1, NewId() as 'v' union
select id=2, NewId()
) t
-- -----------
begin
;with
t1 as (
select * from @r1
),
t2 as (
select * from t1
),
t3 as (
select * from t1
)
-- ----------------
select * from t2
union all select * from t3
end
go
On the other hand, if we put the matter inside t1
instead of the temporary table, it gets called twice.
t1 as (
select id=1, NewId() as 'v' union
select id=2, NewId()
)
Hence, my conclusion is to use temporary table and not reply on cached
results.
Also, ive implemented this on a large scale query that called the "matter" twice only and after moving it to temporary table the execution time went straight half!!
Upvotes: 0
Reputation: 67341
To be honest, there is no answer... The only answer is Race your horses (Eric Lippert).
The way you write your query does not tell you, how the engine will put it in execution. This depends on many, many influences...
You tell the engine, what you want to get and the engine decides how to get this.
This may even differ between identical calls depending on statistics, currently running queries, existing cached results etc.
Just as a hint, try this:
USE master;
GO
CREATE DATABASE testDB;
GO
USE testDB;
GO
--I create a physical test table with 1.000.000 rows
CREATE TABLE testTbl(ID INT IDENTITY PRIMARY KEY, SomeValue VARCHAR(100));
WITH MioRows(Nr) AS (SELECT TOP 1000000 ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values v1 CROSS JOIN master..spt_values v2 CROSS JOIN master..spt_values v3)
INSERT INTO testTbl(SomeValue)
SELECT CONCAT('Test',Nr)
FROM MioRows;
--Now we can start to test this
GO
CHECKPOINT;
GO
DBCC DROPCLEANBUFFERS;
GO
DECLARE @dt DATETIME2 = SYSUTCDATETIME();
--Your approach with CTEs
;with t1 as (select * from testTbl)
,t2 as (select * from t1)
,t3 as (select * from t1)
select t2.ID AS t2_ID,t2.SomeValue AS t2_SomeValue,t3.ID AS t3_ID,t3.SomeValue AS t3_SomeValue INTO target1
from t2
join t3 on t3.ID=t2.ID;
SELECT 'Final CTE',DATEDIFF(MILLISECOND,@dt,SYSUTCDATETIME());
GO
CHECKPOINT;
GO
DBCC DROPCLEANBUFFERS;
GO
DECLARE @dt DATETIME2 = SYSUTCDATETIME();
--Writing the intermediate result into a physical table
SELECT * INTO test1 FROM testTbl;
SELECT 'Write into test1',DATEDIFF(MILLISECOND,@dt,SYSUTCDATETIME());
select t2.ID AS t2_ID,t2.SomeValue AS t2_SomeValue,t3.ID AS t3_ID,t3.SomeValue AS t3_SomeValue INTO target2
from test1 t2
join test1 t3 on t3.ID=t2.ID
SELECT 'Final physical table',DATEDIFF(MILLISECOND,@dt,SYSUTCDATETIME());
GO
CHECKPOINT;
GO
DBCC DROPCLEANBUFFERS;
GO
DECLARE @dt DATETIME2 = SYSUTCDATETIME();
--Same as before, but with an primary key on the intermediate table
SELECT * INTO test2 FROM testTbl;
SELECT 'Write into test2',DATEDIFF(MILLISECOND,@dt,SYSUTCDATETIME());
ALTER TABLE test2 ADD PRIMARY KEY (ID);
SELECT 'Add PK',DATEDIFF(MILLISECOND,@dt,SYSUTCDATETIME());
select t2.ID AS t2_ID,t2.SomeValue AS t2_SomeValue,t3.ID AS t3_ID,t3.SomeValue AS t3_SomeValue INTO target3
from test2 t2
join test2 t3 on t3.ID=t2.ID
SELECT 'Final physical tabel with PK',DATEDIFF(MILLISECOND,@dt,SYSUTCDATETIME());
--Clean up (Careful with real data!!!)
GO
USE master;
GO
--DROP DATABASE testDB;
GO
On my system the
test1
) and thetest2
and ~650ms for creating the index.Although the query is performed twice, the engine can take advantage of cached results.
The engine is really smart... Don't try to be smarter...
If the table would cover a lot of columns and much more data per row the whole test might return something else...
If your CTEs (sub-queries) involve much more complex data with joins, views, functions and so on, the engine might get into troubles finding the best approach.
If performance matters, you can race your horses to test it out. One hint: I sometimes used a TABLE HINT quite successfully: FORCE ORDER
. This will perform joins in the order specified in the query.
Upvotes: 2
Reputation: 43666
Just create the table:
CREATE TABLE tableX
(
x int PRIMARY KEY
);
INSERT INTO tableX
VALUES (1)
,(2)
Turn on the execution plan generation and execute the query. You will get something like this:
So, yes, the table is queried two times. If you are using complex common table expression and you are working with huge amount of data, I will advice to store the result in temporary table.
Sometimes, I am getting very bad execution plans for complex CTEs which were working nicely in the past. Also, you are allowed to define indexes on temporary tables and improve performance further.
Upvotes: 3