Reputation: 177
I have a table that has about 2.2 million rows and for each row I want the topmost parent (root) together on each row if the row has one. Below you can see my Query With only one row.
This statement takes about 45 Seconds and this will take time to run this query. Not all have a parent key so about 1 million do not have a parent. That could be something to think about. But hopefully some of you have a better solution on this problem and I hope you can share it.
WITH allRows
AS (SELECT Organisasjonsnummer AS ID,
Navn,
Organisasjonsnummer [RootId],
Navn [RootName]
FROM Virksomhetstjeneste.Virksomhet
WHERE Hovedenhet_id IS NULL
UNION ALL
SELECT a1.Organisasjonsnummer AS ID,
a1.Navn,
a2.[RootId],
a2.[RootName]
FROM Virksomhetstjeneste.Virksomhet a1
JOIN allRows a2
ON a2.ID = a1.Hovedenhet_id)
SELECT *
FROM allRows
Where ID = 980659763
Result
ID Navn RootId RootName
980659763 NILLE AS AVD ALTA 953581477 NILLE AS
Upvotes: 1
Views: 124
Reputation: 32697
I'm on record many times as being a fan of hierarchyid
. Here's how I'd go about doing it for your situation.
First things first: forgive me for not using your table and column names; I don't speak Norwegian and going back and forth between English and that was error prone for me. So here's the setup:
USE [tempdb];
IF OBJECT_ID('dbo.myTable') IS NOT NULL
DROP TABLE [dbo].[myTable];
CREATE TABLE [dbo].[myTable]
(
[ID] INT NOT NULL ,
CONSTRAINT [PK_myTable] PRIMARY KEY ( [ID] ) ,
[ParentID] INT NULL ,
[Name] VARCHAR(50) NOT NULL ,
[Path] HIERARCHYID NULL,
[Root] AS [Path].GetAncestor([Path].GetLevel() - 1) PERSISTED
);
INSERT INTO [dbo].[myTable]
( [ID], [ParentID], [Name] )
VALUES ( 1, NULL, '1' ),
( 2, 1, '2' ),
( 3, 1, '3' ),
( 4, 2, '4' );
WITH [allRows]
AS (
SELECT [ID] ,
[ParentID] ,
CAST(CONCAT('/', [ID], '/') AS VARCHAR(MAX)) AS [Path]
FROM [dbo].[myTable]
WHERE [ParentID] IS NULL
UNION ALL
SELECT [child].[ID] ,
[child].[ParentID] ,
CAST(CONCAT([parent].[Path], [child].[ID], '/') AS VARCHAR(MAX)) AS [Path]
FROM [dbo].[myTable] AS [child]
JOIN [allRows] AS [parent]
ON [parent].[ID] = [child].[ParentID]
)
UPDATE [m]
SET [m].[Path] = [a].[Path]
FROM [dbo].[myTable] AS [m]
JOIN [allRows] AS [a]
ON [a].[ID] = [m].[ID];
This is just your standard recursive CTE to walk a parent/child hierarchy. The trick here though is that I'm calculating something that I can use as a hierarchyid
as I go. Once the hierarchy is walked, I update the base table with the calculated hierarchy.
Since you mentioned that your table is large-ish, you may want to batch those updates. I'll leave that as an exercise for the reader. Also keep in mind that this is a one-time operation (though you will have to keep the [Path]
column up to date for inserts/updates/deletes; I also leave this as an exercise for the reader).
Now that you've got the hierarchy stored in row, you can do magic:
SELECT [child].[ID] ,
[child].[Name] ,
[root].[ID] ,
[root].[Name]
FROM [dbo].[myTable] AS [child]
JOIN [dbo].[myTable] AS [root]
ON [root].[Path] = [child].[Root]
WHERE child.[ID] = 4;
Which is to say that I can now get the top-level ancestor for a given ID with just a join. Having the root be a persisted computed column is a) fancy and b) unnecessary; it just made the last select a lot cleaner.
If you don't want to do that, you can drop the [Root]
column completely and then the join predicate becomes [root].[Path] = [child].[Path].GetAncestor([Path].GetLevel() - 1)
.
Lastly, keep in mind that the hierarchyid
data type is natively indexable. So you can index [Path]
, [Root]
, or both and it will likely improve performance.
Upvotes: 1