joakim
joakim

Reputation: 1732

The most efficient way to find the top level parent in SQL Server?

Given the following table

catName     catID     parentID
=================================
vehicles    1         0
cars        2         1
sedans      3         2
animals     4         0
cows        5         4

Given a catID, I need to find its top level parent (parentID = 0).

This query is executed 50-100 times a day. There are currently 100-200 rows (maybe more in the future). Up to 8 levels deep. I'm thinking three alternatives:

  1. Using a recursive method
  2. Creating a view
  3. Adding another column topParentID (least favorable)

Which will be the most efficient?

Upvotes: 3

Views: 4057

Answers (2)

Bogdan Sahlean
Bogdan Sahlean

Reputation: 1

SQL2008+:

To store hierarchies , SQL Server includes HIERARCHYID data type. Above data can be "converted" to use HIERARCHYID "values" thus:

catName     catID     parentID  hierarchyNode
=============================================
vehicles    1         0         /1/
cars        2         1         /1/2/
sedans      3         2         /1/2/3/
animals     4         0         /4/
cows        5         4         /4/5/

After conversion, I would drop parentID column.

HIERARCHYID is SQLCLR system data type which include following methods:

To get parent node I would use these methods thus:

DECLARE @node HIERARCHYID
SET     @node = '/1/2/3/'

SELECT  
    currentNodeLvl= @node.GetLevel(),                                 --> 3
    parentAsHID   = @node.GetAncestor(@node.GetLevel() - 1),          --> 0x58
    parentAsString= @node.GetAncestor(@node.GetLevel() - 1).ToString()--> /1/

More, I would create an index on hierarchyNode column thus:

CREATE UNIQUE INDEX IUN_Table_hierarchyNode
ON dbo.Table(hierarchyNode)

and final query will be:

SELECT ..., prt.catID AS parentID
FROM dbo.Table crt -- Curent node
LEFT/INNER JOIN -- It depends on hierarchyID nullability 
dbo.MyTable prt -- Parent node
ON @node.GetAncestor(crt.hierarchyID.GetLevel() - 1).ToString() = prt.hierarchyID

Upvotes: 4

Ryan Nigro
Ryan Nigro

Reputation: 4619

TLDR

For large data sets, use a bridge table.


Bridge table details

At scale (data warehouse scale), the most efficient way I know to solve this is via a bridge table. Check out http://www.askjohnobiee.com/2013/08/how-to-bridge-tables-and-many-to-many.html for the basics.

You end up maintaining a physical table that has a row which matches every parent object to every child and subchild object (and from each object to itself). It's maintenance heavy (since it has to be updated whenever any of the relationships change), but it's extremely efficient to query, even for large data sets.

In your example, I'd build (and fill) a bridge table with the following schema:

CREATE TABLE BridgeCategories
(
 ParentCategoryID int,
 ParentCategoryLevel tinyint,
 ChildCategoryID int,
 ChildCategoryLevel tinyint,
 PRIMARY KEY (ParentCategoryID, ChildCategoryID)
)

-- add an index for your particular query
CREATE NONCLUSTERED INDEX ix_BridgeCategories_ByChildIDAndParentLevel
ON BridgeCategories (ChildCategoryID, ParentCategoryLevel) INCLUDE (ParentCategoryID)

When you populate the BridgeCategories table, I'd set the appropriate levels according to how far down they are in the hierarchy (ie, in your example, category ID 1 - "vehicles" would have a level of 0, while ID 2 - "sedans" would have a level of 2).

In your example, (I think) the following query would fill a bridge table with the above structure, assuming your source table is called DimCategories.

TRUNCATE TABLE BridgeCategories;

declare @currentLevel tinyint;
declare @MAX_LEVELS tinyint;

set @currentLevel = 0;
set @MAX_LEVELS = 16;

-- seed your root level entries
insert into BridgeCategories
SELECT
    catID, @currentLevel, catID, @currentLevel
from DimCategories c
WHERE
    c.parentID = 0;

set @currentLevel = @currentLevel + 1

while (@currentLevel < @MAX_LEVELS)
BEGIN
    -- add any current level parent -> child mappings   
    insert into BridgeCategories
    SELECT
        b.ParentCategoryID,
        b.ParentCategoryLevel,
        c.catID,
        @currentLevel
    from BridgeCategories b
    join DimCategories c
        on b.ChildCategoryID = c.parentID


    -- add the current level self-referencing entries
    insert into BridgeCategories
    SELECT
        c.catID,
        @currentLevel,
        c.catID,
        @currentLevel
    from BridgeCategories b
    join DimCategories c
        on b.ChildCategoryID = c.parentID
    group by c.catID

    set @currentLevel = @currentLevel + 1
end

With this structure in place, you could run the following query to get the root parent catID of any @catID.

select
  ParentCategoryID
from BridgeCategories b
where
  b.ChildCategoryID = @catID
  and b.ParentCategoryLevel = 0

You can look up more information on Ralph Kimball data warehousing concepts if you want to dive deeper.

Upvotes: 2

Related Questions