Reputation: 33
I have 2 tables.
One lists approx 60 root folders with id, in this case "docnumber."
The other table has approx 2.6 million rows in it made up of folders and documents each with an id of "docnumber" again. Each row has a type of either F or D depending on whether it is a document or folder.
Root Table
docnumber, name
100, test
200, anothertest
Folder/Document Table
type, parentnumber, docnumber, name
D, 500, 600, testdoc
F, 300, 500, testfolder
F, 200, 300, testsubfolder
How can I go through every row of type D in the folder/document table and add a field called TopLevelFolder that will pick out the top level folder from the root table for that document? It would need to go up through all the folder levels to find the top level folder from the root table.
Many thanks!
Upvotes: 2
Views: 93
Reputation: 952
Essentially, you need only a recursive CTE to assemble the hierarchy tree, and a ranking function to mark the top tier.
;WITH Hierarchy AS(
SELECT ParentNumber, DocNumber, 1 Tier
FROM FolderDocument
WHERE [Type] = 'D'
UNION ALL
SELECT fd.ParentNumber, h.DocNumber, h.Tier + 1
FROM FolderDocument fd
INNER JOIN Hierarchy h
ON fd.DocNumber = h.ParentNumber
WHERE fd.ParentNumber IS NOT NULL
)
,HierarchyRank AS(
SELECT *, RANK() OVER (
PARTITION BY h.DocNumber ORDER BY h.Tier DESC) RankNum
FROM Hierarchy h
)
SELECT hr.DocNumber, hr.ParentNumber AS TopFolderNumber, fdd.Name AS DocName, fdf.Name AS TopFolderName
FROM HierarchyRank hr
LEFT JOIN FolderDocument fdd
ON hr.DocNumber = fdd.DocNumber
LEFT JOIN FolderDocument fdf
ON hr.ParentNumber = fdf.DocNumber
WHERE RankNum = 1
Note that this will return multiple top-level folders if such exist. If you want to implement a tie-breaker, change the RANK() function to a ROW_NUMBER() and use the ORDER BY clause in the OVER to specify an order that determines the "winner".
This solution also assumes that FolderDocument contains records for the top-level nodes of the tree (which will themselves have null parents). It is simple enough to change the final query to look the top node up from your Root table instead if that is not the case.
I've made a SQLFiddle to show the handling of multiple top folders and a ragged hierarchy.
Edit per request: To use Root as the source for the parent folder name, you can simply do thus in the final query:
SELECT hr.DocNumber, hr.ParentNumber AS TopFolderNumber, fdd.Name AS DocName, r.Name AS TopFolderName
FROM HierarchyRank hr
LEFT JOIN FolderDocument fdd
ON hr.DocNumber = fdd.DocNumber
LEFT JOIN Root r
ON hr.ParentNumber = r.DocNumber
WHERE RankNum = 1
Upvotes: 1