sqlsql-servert-sqlcommon-table-expressionsql-server-2017

Reputation: 651

Optimize temporal table with CTE

I create temporal table in order to set level :

CREATE TABLE [#DesignLvl]
(
    [DesignKey] INT,
    [DesignLevel] INT
);

WITH RCTE AS 
(
    SELECT
        *,
        1 AS [Lvl]
    FROM 
        [Design]
    WHERE 
        [ParentDesignKey] IS NULL

    UNION ALL

    SELECT
        [D].*,
        [Lvl] + 1 AS [Lvl]
    FROM 
        [dbo].[Design] AS [D]
    INNER JOIN 
        [RCTE] AS [rc] ON [rc].[DesignKey] = [D].[ParentDesignKey]
)
INSERT INTO [#DesignLvl]
    SELECT
        [DesignKey], [Lvl]
    FROM 
        [RCTE]

Once created, I used as LEFT JOIN in really big query as:

SELECT... 
FROM.. 
LEFT JOIN [#DesignLvl] AS [dl] ON d.DesignKey = dl.DesignKey
WHERE ...

The query works, but performance has fallen and the query now is too slow. Is there any way to optimize this table?

Execution plan of CTE

I try to add CLUSTERED index as:

CREATE TABLE [#DesignLvl]
(
    [DesignKey] INT,
    [DesignLevel] INT
);

CREATE CLUSTERED INDEX ix_DesignLvl 
    ON [#DesignLvl] ([DesignKey], [DesignLevel]);

Also try:

    CREATE TABLE [#DesignLvl] 
( [DesignKey] INT INDEX IX1 CLUSTERED ,
 [DesignLevel] INT INDEX IX2 NONCLUSTERED );

But I get same result, it took long to execute

Upvotes: 7

Answers (9)

KumarHarsh

Reputation: 5094

Your question is incomplete, "query is slow", but which part of query is slow ?

CTEQuery or LEFT JOIN in really big query

I think Script of big query is required, along with details, like which table contain how many rows, their data type etc.

Throw more details about the big query.

Also let us know if any UDF is involve in join condition.

Why do you left join Temp table ? WHY NOT INNER JOIN

Test the performance separately or CTE and Big Query.

Once use [D].[ParentDesignKey] is not null in recursive part,

SELECT
        [D].*,
        [Lvl] + 1 AS [Lvl]
    FROM 
        [dbo].[Design] AS [D]
    INNER JOIN 
        [RCTE] AS [rc] ON [rc].[DesignKey] = [D].[ParentDesignKey]
and [D].[ParentDesignKey] is not null

NOTE : In CTE use only those columns which is require.

If it is possible to Pre- Calculate [Lvl],because Recursive CTE performance is specially bad involving lot of Records.

How many rows will be process in each CTE Query on average ?

If temp table will hold more than 100 rows then yes create Clustered index on it,

  CREATE CLUSTERED INDEX ix_DesignLvl 
        ON [#DesignLvl] ([DesignKey], [DesignLevel]);

If you are not using [DesignLevel] in join condition then remove from index.

Also ,Reveal index of table [dbo].[Design] and few data of DesignKey and ParentDesignKey.

There are several reason for getting Index Scan, one of them is Selectivity of Key .

So one DesignKey can have how many rows and one ParentDesignKey can have how many rows ?

So depending upon above answer Create Composite Clustered Index on both key of table [dbo].[Design]

So consider my answer is incomplete, I will update it accordingly.

Upvotes: 3

J Sidhu

Reputation: 677

Make sure there are no nulls in DesignKey.ParentDesignKey and #DesignLv1.DesignKey columns and if so, use is not null constraint where you can. i have seen nulls to create cross joins.

If Design table is a transactional table that is being written to very frequently, rebuild indexes on this table frequently.

Create one non clustered index on Design.DesignKey and Design.ParentDesignKey in that sequence.

Create a non clustered index on #DesignLvl DesignKey.

If Design table is large ( > 10 million rows) and a whole bunch of columns, create a indexed view of the distinct columns that you need only for this query and use that.

Check System event log for disk read write failures on disk that has tempdb and (You should put the tempdb on either a RAID 1 or RAID 10 array as they're optimized for high-write applications.) from ( https://searchsqlserver.techtarget.com/tip/SQL-Server-tempdb-best-practices-increase-performance )

Upvotes: 0

shnkr

Reputation: 21

The problem could be with the Design table being huge and joining it with itself without any main filter conditions resulted in scanning of the entire table.

Because you interested in only very few columns like designkey and parentdesignkey, try splitting the data population query (insert into #designlvl) into multiple parts.

Make sure you have a index (designkey,parentdesignkey)

INSERT INTO #DesignLevel
SELECT DISTINCT DesignKey, 1 FROM Design WHERE ParentDesignKey IS NULL

INSERT INTO #DesignLevel
SELECT DISTINCT ParentDesignKey, Lvl+1 FROM Design WHERE ParentDesignKey is NOT NULL

Upvotes: 1

Ryan Sparks

Reputation: 1376

Have you tried changing the SELECT * to just SELECT DesignLevel, I found that for wide rows this was enough to change the execution plan to choose to use an eager spool for the index scan:

WITH RCTE AS 
(
    SELECT
        [DesignKey],
        1 AS [Lvl]
    FROM 
        [Design]
    WHERE 
        [ParentDesignKey] IS NULL

    UNION ALL

    SELECT
        [D].[DesignKey],
        [Lvl] + 1 AS [Lvl]
    FROM 
        [dbo].[Design] AS [D]
    INNER JOIN 
        [RCTE] AS [rc] ON [rc].[DesignKey] = [D].[ParentDesignKey]
)
INSERT INTO [#DesignLvl]
    SELECT
        [DesignKey], [Lvl]
    FROM 
        [RCTE]

Plan and test SQL can be found here: https://www.brentozar.com/pastetheplan/?id=BymxTD4wV

Upvotes: 2

Nițu Alexandru

Reputation: 714

Have you tried memory optimized tables? I used them in a similar process (recursive CTE) and I had spectacular results. In SQL Server 2017 should be included in Standard Edition also. You first need to create a filegroup for memory-optimized data:

ALTER DATABASE MyDB 
ADD FILEGROUP mem_data CONTAINS MEMORY_OPTIMIZED_DATA; 
GO 
ALTER DATABASE MyDB 
ADD FILE (NAME = 'MemData', FILENAME = 'D:\Data\MyDB_MemData.ndf') TO FILEGROUP mem_data;

Then you create (or convert) your table:

CREATETABLE dbo.MemoryTable
(
Col1 INT IDENTITY PRIMARY KEY
...
)
WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA);

Upvotes: 2

deroby

Reputation: 6002

As others have said, it's not entirely clear what part of your query is slow. Nor do we have any idea about the number of records (could be 100, could be 100 million) or the actual timings (you might consider 10 seconds for loading millions of rows slow?!?).

We also don't know how 'hard' your really big query is; for all we know it could be slow without the LEFT OUTER JOIN too.

Anyway, to get a better idea, what happens if you run this: (untested code, you might need to fix some things)

DECLARE @level int = 0,
        @rowcount int

-- create working table to calculate levels
SELECT lvl = @level,
       D.[DesignKey]
  INTO #hierarchy
  FROM [Design] D
 WHERE D.[ParentDesignKey] IS NULL

SELECT @rowcount = @@ROWCOUNT

PRINT Convert(nvarchar, CURRENT_TIMESTAMP, 113) + ' - Loaded ' + Convert(nvarchar, @rowcount) + N' level ' + Convert(nvarchar, @rowcount) + ' records...'

CREATE UNIQUE CLUSTERED INDEX uq0 ON #hierarchy (lvl, [DesignKey])

WHILE @rowcount > 0
    BEGIN

        INSERT #hierarchy        
        SELECT lvl = @level + 1,
               D.[DesignKey]
          FROM #hierarchy t
          JOIN [Design] D
            ON D.[ParentDesignKey] = t.[DesignKey]
         WHERE t.lvl = @level

        SELECT @rowcount = @@ROWCOUNT,
               @level = @level + 1

        PRINT Convert(nvarchar, CURRENT_TIMESTAMP, 113) + ' - Loaded ' + Convert(nvarchar, @rowcount) + N' level ' + Convert(nvarchar, @rowcount) + ' records...'
    END

GO

-- we now have a lvl value for each DesignKey but the index is backwards for future use; so add index in the other direction
PRINT Convert(nvarchar, CURRENT_TIMESTAMP, 113) + ' - re-indexing...'
CREATE UNIQUE INDEX uq1 ON #hiearchy ([DesignKey]) INCLUDE (lvl) WITH (FILLFACTOR = 100)

PRINT Convert(nvarchar, CURRENT_TIMESTAMP, 113) + ' - done.'

GO

PRINT Convert(nvarchar, CURRENT_TIMESTAMP, 113) + ' - Starting query...'

-- actual use:
;WITH DesignLvlCTE
   AS (SELECT h.lvl, D.*
         FROM [Design] D
         JOIN #hierarchy h
           ON h.[DesignKey] = D.[DesignKey])
SELECT... 
 INTO #result -- leave this in to exclude overhead time of client
FROM.. 
LEFT JOIN DesignLvlCTE AS [dl] ON d.DesignKey = dl.DesignKey
WHERE ...

PRINT Convert(nvarchar, CURRENT_TIMESTAMP, 113) + ' - Done fetching data.'

-- get results
SELECT * FROM #result

PRINT Convert(nvarchar, CURRENT_TIMESTAMP, 113) + ' - Done.'

-- DROP TABLE #result

Upvotes: 2

bruceskyaus

Reputation: 783

The performance could be slower because a clustered index on the dbo.Design table is being accessed inside a nested loop. According to the cost estimate, the database is spending 66% of it's time scanning that index. Looping over that just makes it worse.

See related question

Consider changing the index on dbo.Design to be non-clustered, or try creating another temporary table with a non-clustered index and use that for your recursive query:

CREATE TABLE [#DesignTemp]
(
    ParentDesignKey INT,
    DesignKey INT
);

-- Insert the data, then create the index.
INSERT INTO [#DesignTemp]
SELECT
ParentDesignKey,
DesignKey
FROM [dbo].[Design];

COMMIT;

-- Try this index, or create indexes for individual columns if the plan works better at high volumes.
CREATE NONCLUSTERED INDEX ix_DesignTemp1 ON [#DesignTemp] (ParentDesignKey, DesignKey);

CREATE TABLE [#DesignLvl]
(
    [DesignKey] INT,
    [DesignLevel] INT
);

WITH RCTE AS 
(
    SELECT
        *,
        1 AS [Lvl]
    FROM 
        [DesignTemp]
    WHERE 
        [ParentDesignKey] IS NULL

    UNION ALL

    SELECT
        [D].*,
        [Lvl] + 1 AS [Lvl]
    FROM 
        [DesignTemp] AS [D]
    INNER JOIN 
        [RCTE] AS [rc] ON [rc].[DesignKey] = [D].[ParentDesignKey]
)
INSERT INTO [#DesignLvl]
    SELECT
        [DesignKey], [Lvl]
    FROM 
        [RCTE];

Upvotes: 4

JBJ

Reputation: 403

try @table, you query with a memory temp table instead of a disk temp table

declare @DesignLvl table
(
    [DesignKey] INT,
    [DesignLevel] INT
);

WITH RCTE AS 
(
    SELECT
        *,
        1 AS [Lvl]
    FROM 
        [Design]
    WHERE 
        [ParentDesignKey] IS NULL

    UNION ALL

    SELECT
        [D].*,
        [Lvl] + 1 AS [Lvl]
    FROM 
        [dbo].[Design] AS [D]
    INNER JOIN 
        [RCTE] AS [rc] ON [rc].[DesignKey] = [D].[ParentDesignKey]
)
INSERT INTO @DesignLvl
    SELECT
        [DesignKey], [Lvl]
    FROM 
        [RCTE]

may help a bit, how many rows are we talking about and what sql server edition? @@version?

Upvotes: 2

Luis Cazares

Reputation: 3595

According to my testing published on this article, a set-based loop could give you a performance improvement over a recursive CTE.

DECLARE @DesignLevel int = 0;

INSERT INTO [#DesignLvl]
SELECT [DesignKey], 1
FROM [RCTE];

WHILE @@ROWCOUNT > 0
BEGIN
    SET @DesignLevel += 1;

    INSERT INTO [#DesignLvl]
    SELECT [D].[DesignKey], dl.DesignLevel
    FROM [dbo].[Design] AS [D]
    JOIN [#DesignLvl] AS [dl] ON [dl].[DesignKey] = [D].[ParentDesignKey]
    WHERE dl.DesignLevel = @DesignLevel;
END;

Upvotes: 2

Optimize temporal table with CTE

Answers (9)

Related Questions