Westerlund.io
Westerlund.io

Reputation: 2833

Improve performance on CTE with sub-queries

I have a table with this structure:

WorkerID    Value           GroupID Sequence    Validity
1           '20%'           1       1           2018-01-01
1           '10%'           1       1           2017-06-01
1           'Yes'           1       2           2017-06-01
1           '2018-01-01'    2       1           2017-06-01
1           '17.2'          2       2           2017-06-01
2           '10%'           1       1           2017-06-01
2           'No'            1       2           2017-06-01
2           '2016-03-01'    2       1           2017-06-01
2           '15.9'          2       2           2017-06-01

This structure was created so that the client can create customized data for a worker. For example Group 1 can be something like "Salary" and Sequence is one value that belongs to that Group like "Overtime Compensation". The column Value is a VARCHAR(150) field and the correct validation and conversation is done in another part of the application.

The Validity column exist mainly for historical reasons.

Now I would like to show, for the different workers, the information in a grid where each row should be one worker (displaying the one with the most recent Validity):

Worker  1_1     1_2     2_1         2_2
1       20%     Yes     2018-01-01  17.2
2       10%     No      2016-03-01  15.9

To accomplish this I created a CTE that looks like this:

WITH CTE_worker_grid
    AS
    (
    SELECT
        worker,

        /* 1 */
        (
            SELECT top 1 w.Value
                FROM worker_values AS w
                WHERE w.GroupID = 1
                AND w.Sequence = 1
                ORDER BY w.Validity DESC
        ) AS 1_1,
        (
            SELECT top 1 w.Value
                FROM worker_values AS w
                WHERE w.GroupID = 1
                AND w.Sequence = 2
                ORDER BY w.Validity DESC
        ) AS 1_2,

        /* 2 */
        (
            SELECT top 1 w.Value
                FROM worker_values AS w
                WHERE w.GroupID = 2
                AND w.Sequence = 1
                ORDER BY w.Validity DESC
        ) AS 2_1,
        (
            SELECT top 1 w.Value
                FROM worker_values AS w
                WHERE w.GroupID = 2
                AND w.Sequence = 2
                ORDER BY w.Validity DESC
        ) AS 2_2
    )
GO

This produces the correct result but it's very slow as it creates this grid for over 18'000 worker with almost 30 Groups and up to 20 Sequences in each Group.

How could one speed up the process of a CTE of this magnitude? Should CTE even be used? Can the sub-queries be changed or re-factored out to speed up the execution?

Upvotes: 1

Views: 40

Answers (1)

Pittsburgh DBA
Pittsburgh DBA

Reputation: 6773

Use a PIVOT!

+----------+---------+---------+------------+---------+
| WorkerId | 001_001 | 001_002 |  002_001   | 002_002 |
+----------+---------+---------+------------+---------+
|        1 | 20%     | Yes     | 2018-01-01 |    17.2 |
|        2 | 10%     | No      | 2016-03-01 |    15.9 |
+----------+---------+---------+------------+---------+

SQL Fiddle: http://sqlfiddle.com/#!18/6e768/1

CREATE TABLE WorkerAttributes
    (
    WorkerID INT NOT NULL
    , [Value] VARCHAR(50) NOT NULL
    , GroupID INT NOT NULL
    , [Sequence] INT NOT NULL
    , Validity DATE NOT NULL
    )

INSERT INTO WorkerAttributes
    (WorkerID, Value, GroupID, Sequence, Validity)
VALUES
    (1, '20%', 1, 1, '2018-01-01')
    , (1, '10%', 1, 1, '2017-06-01')
    , (1, 'Yes', 1, 2, '2017-06-01')
    , (1, '2018-01-01', 2, 1, '2017-06-01')
    , (1, '17.2', 2, 2, '2017-06-01')
    , (2, '10%', 1, 1, '2017-06-01')
    , (2, 'No', 1, 2, '2017-06-01')
    , (2, '2016-03-01', 2, 1, '2017-06-01')
    , (2, '15.9', 2, 2, '2017-06-01')


;WITH CTE_WA_RANK
AS
(
SELECT
    ROW_NUMBER() OVER (PARTITION BY WorkerID, GroupID, [Sequence] ORDER BY Validity DESC) AS VersionNumber
    , WA.WorkerID
    , WA.GroupID
    , WA.[Sequence]
    , WA.[Value]
FROM
    WorkerAttributes AS WA
),
CTE_WA
AS
(
SELECT
    WA_RANK.WorkerID
    , RIGHT('000' + CAST(WA_RANK.GroupID AS VARCHAR(3)), 3)
        + '_'
        + RIGHT('000' + CAST(WA_RANK.[Sequence] AS VARCHAR(3)), 3) AS SMART_KEY
    , WA_RANK.[Value]
FROM
    CTE_WA_RANK AS WA_RANK
WHERE
    WA_RANK.VersionNumber = 1
)
SELECT
    WorkerId
    , [001_001] AS [001_001]
    , [001_002] AS [001_002]
    , [002_001] AS [002_001]
    , [002_002] AS [002_002]
FROM
(
SELECT
    CTE_WA.WorkerId
    , CTE_WA.SMART_KEY
    , CTE_WA.[Value]
FROM
    CTE_WA
) AS WA
PIVOT
(
MAX([Value])
FOR
    SMART_KEY IN 
        (
        [001_001]
        , [001_002]
        , [002_001]
        , [002_002]
        )
) AS PVT

Upvotes: 1

Related Questions