Reputation: 1732
Given the following table
catName catID parentID
=================================
vehicles 1 0
cars 2 1
sedans 3 2
animals 4 0
cows 5 4
Given a catID
, I need to find its top level parent (parentID = 0
).
This query is executed 50-100 times a day. There are currently 100-200 rows (maybe more in the future). Up to 8 levels deep. I'm thinking three alternatives:
topParentID
(least favorable)Which will be the most efficient?
Upvotes: 3
Views: 4057
Reputation: 1
SQL2008+:
To store hierarchies , SQL Server includes HIERARCHYID data type. Above data can be "converted" to use HIERARCHYID
"values" thus:
catName catID parentID hierarchyNode
=============================================
vehicles 1 0 /1/
cars 2 1 /1/2/
sedans 3 2 /1/2/3/
animals 4 0 /4/
cows 5 4 /4/5/
After conversion, I would drop parentID
column.
HIERARCHYID is SQLCLR system data type which include following methods:
To get parent node I would use these methods thus:
DECLARE @node HIERARCHYID
SET @node = '/1/2/3/'
SELECT
currentNodeLvl= @node.GetLevel(), --> 3
parentAsHID = @node.GetAncestor(@node.GetLevel() - 1), --> 0x58
parentAsString= @node.GetAncestor(@node.GetLevel() - 1).ToString()--> /1/
More, I would create an index on hierarchyNode
column thus:
CREATE UNIQUE INDEX IUN_Table_hierarchyNode
ON dbo.Table(hierarchyNode)
and final query will be:
SELECT ..., prt.catID AS parentID
FROM dbo.Table crt -- Curent node
LEFT/INNER JOIN -- It depends on hierarchyID nullability
dbo.MyTable prt -- Parent node
ON @node.GetAncestor(crt.hierarchyID.GetLevel() - 1).ToString() = prt.hierarchyID
Upvotes: 4
Reputation: 4619
For large data sets, use a bridge table.
At scale (data warehouse scale), the most efficient way I know to solve this is via a bridge table. Check out http://www.askjohnobiee.com/2013/08/how-to-bridge-tables-and-many-to-many.html for the basics.
You end up maintaining a physical table that has a row which matches every parent object to every child and subchild object (and from each object to itself). It's maintenance heavy (since it has to be updated whenever any of the relationships change), but it's extremely efficient to query, even for large data sets.
In your example, I'd build (and fill) a bridge table with the following schema:
CREATE TABLE BridgeCategories
(
ParentCategoryID int,
ParentCategoryLevel tinyint,
ChildCategoryID int,
ChildCategoryLevel tinyint,
PRIMARY KEY (ParentCategoryID, ChildCategoryID)
)
-- add an index for your particular query
CREATE NONCLUSTERED INDEX ix_BridgeCategories_ByChildIDAndParentLevel
ON BridgeCategories (ChildCategoryID, ParentCategoryLevel) INCLUDE (ParentCategoryID)
When you populate the BridgeCategories
table, I'd set the appropriate levels according to how far down they are in the hierarchy (ie, in your example, category ID 1 - "vehicles" would have a level of 0, while ID 2 - "sedans" would have a level of 2).
In your example, (I think) the following query would fill a bridge table with the above structure, assuming your source table is called DimCategories
.
TRUNCATE TABLE BridgeCategories;
declare @currentLevel tinyint;
declare @MAX_LEVELS tinyint;
set @currentLevel = 0;
set @MAX_LEVELS = 16;
-- seed your root level entries
insert into BridgeCategories
SELECT
catID, @currentLevel, catID, @currentLevel
from DimCategories c
WHERE
c.parentID = 0;
set @currentLevel = @currentLevel + 1
while (@currentLevel < @MAX_LEVELS)
BEGIN
-- add any current level parent -> child mappings
insert into BridgeCategories
SELECT
b.ParentCategoryID,
b.ParentCategoryLevel,
c.catID,
@currentLevel
from BridgeCategories b
join DimCategories c
on b.ChildCategoryID = c.parentID
-- add the current level self-referencing entries
insert into BridgeCategories
SELECT
c.catID,
@currentLevel,
c.catID,
@currentLevel
from BridgeCategories b
join DimCategories c
on b.ChildCategoryID = c.parentID
group by c.catID
set @currentLevel = @currentLevel + 1
end
With this structure in place, you could run the following query to get the root parent catID of any @catID.
select
ParentCategoryID
from BridgeCategories b
where
b.ChildCategoryID = @catID
and b.ParentCategoryLevel = 0
You can look up more information on Ralph Kimball data warehousing concepts if you want to dive deeper.
Upvotes: 2