user5767413
user5767413

Reputation: 177

Bad Performance in parent-child SQL finding topmost parent in the same row

I have a table that has about 2.2 million rows and for each row I want the topmost parent (root) together on each row if the row has one. Below you can see my Query With only one row.

This statement takes about 45 Seconds and this will take time to run this query. Not all have a parent key so about 1 million do not have a parent. That could be something to think about. But hopefully some of you have a better solution on this problem and I hope you can share it.

WITH allRows
     AS (SELECT Organisasjonsnummer AS ID,
                Navn,
                Organisasjonsnummer   [RootId],
                Navn [RootName]
         FROM   Virksomhetstjeneste.Virksomhet
         WHERE  Hovedenhet_id IS NULL
         UNION ALL
         SELECT a1.Organisasjonsnummer AS ID,
                a1.Navn,
                a2.[RootId],
                a2.[RootName]
         FROM   Virksomhetstjeneste.Virksomhet a1
                JOIN allRows a2
                  ON a2.ID = a1.Hovedenhet_id)
SELECT *
FROM   allRows 
Where ID = 980659763

Result

ID          Navn                    RootId          RootName
980659763   NILLE AS AVD ALTA   953581477   NILLE AS

Upvotes: 1

Views: 124

Answers (1)

Ben Thul
Ben Thul

Reputation: 32697

I'm on record many times as being a fan of hierarchyid. Here's how I'd go about doing it for your situation.

First things first: forgive me for not using your table and column names; I don't speak Norwegian and going back and forth between English and that was error prone for me. So here's the setup:

USE [tempdb];

IF OBJECT_ID('dbo.myTable') IS NOT NULL
    DROP TABLE [dbo].[myTable];

CREATE TABLE [dbo].[myTable]
    (
      [ID] INT NOT NULL ,
      CONSTRAINT [PK_myTable] PRIMARY KEY ( [ID] ) ,
      [ParentID] INT NULL ,
      [Name] VARCHAR(50) NOT NULL ,
      [Path] HIERARCHYID NULL,
      [Root] AS [Path].GetAncestor([Path].GetLevel() - 1) PERSISTED
    );

INSERT  INTO [dbo].[myTable]
        ( [ID], [ParentID], [Name] )
VALUES  ( 1, NULL, '1' ),
        ( 2, 1, '2' ),
        ( 3, 1, '3' ),
        ( 4, 2, '4' );
WITH    [allRows]
          AS (
               SELECT   [ID] ,
                        [ParentID] ,
                        CAST(CONCAT('/', [ID], '/') AS VARCHAR(MAX)) AS [Path]
               FROM     [dbo].[myTable]
               WHERE    [ParentID] IS NULL
               UNION ALL
               SELECT   [child].[ID] ,
                        [child].[ParentID] ,
                        CAST(CONCAT([parent].[Path], [child].[ID], '/') AS VARCHAR(MAX)) AS [Path]
               FROM     [dbo].[myTable] AS [child]
               JOIN     [allRows] AS [parent]
                        ON [parent].[ID] = [child].[ParentID]
             )
    UPDATE  [m]
    SET     [m].[Path] = [a].[Path]
    FROM    [dbo].[myTable] AS [m]
    JOIN    [allRows] AS [a]
            ON [a].[ID] = [m].[ID];

This is just your standard recursive CTE to walk a parent/child hierarchy. The trick here though is that I'm calculating something that I can use as a hierarchyid as I go. Once the hierarchy is walked, I update the base table with the calculated hierarchy.

Since you mentioned that your table is large-ish, you may want to batch those updates. I'll leave that as an exercise for the reader. Also keep in mind that this is a one-time operation (though you will have to keep the [Path] column up to date for inserts/updates/deletes; I also leave this as an exercise for the reader).

Now that you've got the hierarchy stored in row, you can do magic:

SELECT  [child].[ID] ,
        [child].[Name] ,
        [root].[ID] ,
        [root].[Name]
FROM    [dbo].[myTable] AS [child]
JOIN    [dbo].[myTable] AS [root]
    ON [root].[Path] = [child].[Root]
WHERE child.[ID] = 4;

Which is to say that I can now get the top-level ancestor for a given ID with just a join. Having the root be a persisted computed column is a) fancy and b) unnecessary; it just made the last select a lot cleaner.

If you don't want to do that, you can drop the [Root] column completely and then the join predicate becomes [root].[Path] = [child].[Path].GetAncestor([Path].GetLevel() - 1).

Lastly, keep in mind that the hierarchyid data type is natively indexable. So you can index [Path], [Root], or both and it will likely improve performance.

Upvotes: 1

Related Questions