Marcus T-M
Marcus T-M

Reputation: 33

Recursive challenge in SQL

I have 2 tables.

One lists approx 60 root folders with id, in this case "docnumber."

The other table has approx 2.6 million rows in it made up of folders and documents each with an id of "docnumber" again. Each row has a type of either F or D depending on whether it is a document or folder.

Root Table

docnumber, name

100, test

200, anothertest

Folder/Document Table

type, parentnumber, docnumber, name

D, 500, 600, testdoc

F, 300, 500, testfolder

F, 200, 300, testsubfolder

How can I go through every row of type D in the folder/document table and add a field called TopLevelFolder that will pick out the top level folder from the root table for that document? It would need to go up through all the folder levels to find the top level folder from the root table.

Many thanks!

Upvotes: 2

Views: 93

Answers (1)

Jonathan Van Matre
Jonathan Van Matre

Reputation: 952

Essentially, you need only a recursive CTE to assemble the hierarchy tree, and a ranking function to mark the top tier.

;WITH Hierarchy AS(
    SELECT ParentNumber, DocNumber, 1 Tier 
    FROM FolderDocument 
    WHERE [Type] = 'D'
    UNION ALL
    SELECT fd.ParentNumber, h.DocNumber, h.Tier + 1
    FROM FolderDocument fd
    INNER JOIN Hierarchy h
    ON fd.DocNumber = h.ParentNumber
    WHERE fd.ParentNumber IS NOT NULL
)
,HierarchyRank AS(
    SELECT *, RANK() OVER (
        PARTITION BY h.DocNumber ORDER BY h.Tier DESC) RankNum
    FROM Hierarchy h
)
SELECT hr.DocNumber, hr.ParentNumber AS TopFolderNumber, fdd.Name AS DocName, fdf.Name AS TopFolderName
FROM HierarchyRank hr
LEFT JOIN FolderDocument fdd
ON hr.DocNumber = fdd.DocNumber
LEFT JOIN FolderDocument fdf
ON hr.ParentNumber = fdf.DocNumber
WHERE RankNum = 1

Note that this will return multiple top-level folders if such exist. If you want to implement a tie-breaker, change the RANK() function to a ROW_NUMBER() and use the ORDER BY clause in the OVER to specify an order that determines the "winner".

This solution also assumes that FolderDocument contains records for the top-level nodes of the tree (which will themselves have null parents). It is simple enough to change the final query to look the top node up from your Root table instead if that is not the case.

I've made a SQLFiddle to show the handling of multiple top folders and a ragged hierarchy.

Edit per request: To use Root as the source for the parent folder name, you can simply do thus in the final query:

SELECT hr.DocNumber, hr.ParentNumber AS TopFolderNumber, fdd.Name AS DocName, r.Name AS TopFolderName
FROM HierarchyRank hr
LEFT JOIN FolderDocument fdd
ON hr.DocNumber = fdd.DocNumber
LEFT JOIN Root r
ON hr.ParentNumber = r.DocNumber
WHERE RankNum = 1

Upvotes: 1

Related Questions