Reputation: 33
I have a table "People" with a primary key "PersonID" and a field that is "Supervisor". The "Supervisor" field contains the foreign key of a "PersonID" to create a self join.
I would like to create an sql query that returns all people with "Me" (the PersonID that is logged into the database) as their supervisor, and anyone that has someone on that list labeled as their supervisor. Essentially I would like to list anyone below the supplied PersonID in the chain of command.
Upvotes: 2
Views: 1940
Reputation: 33
After considering the options presented here I have decided that I am going about this the wrong way. I have added a field to the "People" table "PermissionsLevel" which is a lookup from another table with a simple "PermissionNumber" and "PermissionDescription". I then use a select case in the Form_load() event for the logged in user's permission level.
Select Case userPermissionLevel
Case Creator
'Queries everyone in the database
Case Administrator
'Queries everyone in the "Department" they are a member of
Case Supervisor
'Queries all people WHERE supervisor = userID OR _
supervisor IN (Select PersonID From People WHERE supervisor = userID)
Case Custodian '(Person in charge of maintaining the HAZMAT Cabinet and SDS)
'Queries WHERE supervisor = DLookup("Supervisor", "People", "PersonID = " & userID)
Upvotes: 0
Reputation: 1426
SQL is great for many things, but hierarchical data is one of bigger challenges. Some vendors has provided custom extensions to work around this (e.g. Oracle's CONNECT
syntax or SQL Server's hierarchyid
data type), but we probably want to keep this standard SQL1.
What you have modeled is called "adjacency list" -- this is very simple and straightforward, and always consistent2. But as you found out, this sucks for querying, especially for an unknown depth or for a subtree, rather than from the root node.
Therefore, we need to supplement this with an additional model. There are basically 3 other models that you should use in conjunction with the adjacency list model.
To study them in depth, we'll use this diagram:
For this discussion, we are also assuming this is a simple hierarchy, that there are no cycles.
Joe Celko's Nested Sets.
Basically, you store the "Left" and "Right" value of each node which indicates its position in the tree. The root node will always have 1
for "Left" and <count of nodes * 2>
for "Right". This is easier to illustrate with a diagram:
Note that each node gets assigned a pair of number, one for "Left", and other for "Right". With that information, you can do some logical deductions. Finding all children becomes easy - you filter for values where the nodes' "Left" is greater than the target node's "Left" and where the same nodes' "Right" is smaller than the target node's "Right".
The biggest downside with the model is that a change to the hierarchy almost invariably requires updating the entire tree, which makes it very awful to maintain for a fast moving charts. If this is something you only update once a year, this might be acceptable.
The other issue with this model is that if there is a need for a multiple hierarchies, the nested set will not work without additional columns to track the separate hierarchy.
Materialized Path
You know how a filesystem path works, right? This is basically the same thing, except that we are storing this in the database3. For instance, a possible implementation of a materialized path might look like this:
ID Name Path
1 Alice 1/
2 Bob 1/2/
3 Christina 1/3/
4 Dwayne 1/4/
5 Erin 1/2/5/
6 Frank 1/2/6/
7 Georgia 1/2/7/
8 Harry 1/2/7/8/
9 Isabella 1/3/9/
10 Jake 1/3/10/
11 Kirby 1/3/10/11/
12 Lana 1/3/12/
13 Mike 1/4/13/
14 Norma 1/4/13/14/
15 Opus 1/4/15/
16 Rianna 1/4/16/
This is quite intuitive and can perform OK as long you write your SQL queries to use predicates like WHERE Path LIKE '1/4/*'
. Engines will be able to use index on the path column. Note that if your queries involve querying a middle of the tree or from bottom up, that means index cannot be used and performance will suffer for it. But programming against a materialized path is pretty easy to understand. Updating a part of the tree won't propagate to unrelated nodes as the nested sets so that's also a plus in its favor.
The biggest downside is that to be indexable, the text has to be a short column. For Access database that puts a 255 character limit on your path field. Even worse, there is no good way to predict when you are about to hit the limit -- you could hit it because you have too deep tree, or because you have too wide tree (e.g. bigger numbers taking up too much spaces). For that reason, large trees might necessitate some hard-coded limit to avoid this situation.
Ancestry Traversal Closure
This model involves a separate table which is updated whenever the employee table is updated. Instead of only recording the immediate relationship, we enumerate all the ancestry between two nodes. To illustrate, this is how the table will look like:
Employee table:
ID Name
1 Alice
2 Bob
3 Christina
4 Dwayne
5 Erin
6 Frank
7 Georgia
8 Harry
9 Isabella
10 Jake
11 Kirby
12 Lana
13 Mike
14 Norma
15 Opus
16 Rianna
Employee Ancestry Table:
Origin Ancestor
1 1
2 1
2 2
3 1
3 3
4 1
4 4
5 1
5 2
5 5
6 1
6 2
6 6
7 1
7 2
7 7
8 1
8 2
8 7
8 8
9 1
9 3
9 9
10 1
10 3
10 10
11 1
11 3
11 10
11 11
12 1
12 3
12 12
13 1
13 4
14 1
14 4
14 13
14 14
15 1
15 4
15 15
16 1
16 4
16 16
As you see, we generate several rows worth of all possible relationship between two nodes. As a bonus because it's a table, we can make use of foreign key and cascade delete to help keep it consistent. We still have to manually manage the inserts & updates however. Because the table is also narrow, it makes it very easy to create query that can leverage index on the key, the origin and the ancestor to find the subtree, the children, the parent. This is the most flexible system at expense of extra complexity around the maintenance.
Maintaining the model
All 3 models discussed are basically denormalizing the data a bit in order to simplify the query and support an arbitrary depth search. A consequence of that is this necessitates us to manually manage the changes when the employee table is modified in some fashion.
The most simplest approach is simply to just write a VBA procedure that will truncate and re-build the entire chart using your preferred model. This can work very well when the chart is small or does not change often.
On the other end, you could consider using Data Macros on your employee table to perform the maintenance required to propagate the updates to the hierarchy. A caveat, though, if you use data macros, this makes it harder to port the data to another RDBMS system since none of those support data macros. (To be fair, the problem would still exist if you were porting from SQL Server's stored procedures/triggers to Oracle's stored procedure/triggers - those are very steeped in vendor's dialect that porting is a challenge). Using data macros or trigger + stored procedure mean that you can rely on the engine to maintain the hierarchy for you without any programming in the forms.
A common temptation is to use form's AfterUpdate
event to maintain the changes and that would work.... unless someone update it outside the form. For that reason, I would actually prefer that we used a data macro rather than relying on everyone to always use the form.
Note that in all of this discussion, we should NOT discard the adjacency list model. As I commented earlier, this is the most normalized and consistent way to model the hierarchy. It is literally impossible to create a nonsensical hierarchy with it. For that reason alone, you should keep it as your "authoritative truth", which you can then build your model upon to aid the querying performance.
Another good reason to keep using the adjacency list model is regardless of which model you use above, they introduce either additional columns or additional tables that are not meant to be directly edited by users but are for purpose somewhat equivalent to a calculated field and thus should not be tinkered with. If the users are allowed to edit only the SupervisorID
field, then it becomes easy to code your data macros/triggers/VBA procedure around that one field, and updating the "calculations" of the additional fields/table to ensure correctness for the queries depending on such models.
1. SQL Standard does describe a way to create a recursive query. However, the compliance for that particular feature seems to be poor. Furthermore, the performance may not be that great. (which is the case with SQL Server's particular implementation) The 3 models discussed are easily implemented in most of RDBMS and queries for querying the hierarchy can be easily written and ported. However, the implementation to automatically manage the changes to the hierarchy invariably requires vendor-specific dialect, using triggers or stored procedure which is not very portable.
2. When I say consistent, I only mean that the model cannot create a nonsensical output. It's still possible to provide wrong data and make a weird hierarchy such as an employee's supervisor reporting to the employee, but not one that would give undefined results. However, it still is a hierarchy (even if it ends up as a cyclical graph). With other models, failing to maintain the derived data correctly means the queries will start returning undefined results.
3. SQL Server's hierarchyid
data type is in fact an implementation of this model.
Upvotes: 3
Reputation: 55841
As you probably will have a rather limited count, say six, of levels deep, you can use a query with subqueries with subqueries ... etc. Very simple.
For an unlimited number of levels, the fastest way I've found, is to create a lookup function which walks the tree for each record. This can output either the level of the record or a compound key build by the key of the record and all keys above.
As the lookup function will use the same recordset for every call, you can make it static, and (for JET) you can improve further by using Seek to locate the records.
Here's an example which will give you an idea:
Public Function RecursiveLookup(ByVal lngID As Long) As String
Static dbs As Database
Static tbl As TableDef
Static rst As Recordset
Dim lngLevel As Long
Dim strAccount As String
If dbs Is Nothing Then
' For testing only.
' Replace with OpenDatabase of backend database file.
Set dbs = CurrentDb()
Set tbl = dbs.TableDefs("tblAccount")
Set rst = dbs.OpenRecordset(tbl.Name, dbOpenTable)
End If
With rst
.Index = "PrimaryKey"
While lngID > 0
.Seek "=", lngID
If Not .NoMatch Then
lngLevel = lngLevel + 1
lngID = !MasterAccountFK.Value
If lngID > 0 Then
strAccount = str(!AccountID) & strAccount
End If
Else
lngID = 0
End If
Wend
' Leave recordset open.
' .Close
End With
' Don't terminate static objects.
' Set rst = Nothing
' Set tbl = Nothing
' Set dbs = Nothing
' Alternative expression for returning the level.
' (Adjust vartype of return value of function.) ' RecursiveLookup = lngLevel ' As Long
RecursiveLookup = strAccount
End Function
This assumes a table with a primary key ID and a foreign (master) key pointing to the parent record - and a top level record (not used) with a visible key (AccountID) of 0.
Now your tree will be nicely shown almost instantaneously using a query like this where Account will be the visible compound key:
SELECT
*, RecursiveLookup([ID]) AS Account
FROM
tblAccount
WHERE
(AccountID > 0)
ORDER BY
RecursiveLookup([ID]);
If you wish to use this to add records to another table, you should not make an SQL call for each, as this is very slow, but first open a recordset, then use AddNew-Update to append each record and, finally, close this recordset.
Upvotes: 1
Reputation: 16015
Consider the following set of functions:
Function BuildQuerySQL(lngsid As Long) As String
Dim intlvl As Integer
Dim strsel As String: strsel = selsql(intlvl)
Dim strfrm As String: strfrm = "people as p0 "
Dim strwhr As String: strwhr = "where p0.supervisor = " & lngsid
While HasRecordsP(strsel & strfrm & strwhr)
intlvl = intlvl + 1
BuildQuerySQL = BuildQuerySQL & " union " & strsel & strfrm & strwhr
strsel = selsql(intlvl)
If intlvl > 1 Then
strfrm = "(" & strfrm & ")" & frmsql(intlvl)
Else
strfrm = strfrm & frmsql(intlvl)
End If
Wend
BuildQuerySQL = Mid(BuildQuerySQL, 8)
End Function
Function HasRecordsP(strSQL As String) As Boolean
Dim dbs As DAO.Database
Set dbs = CurrentDb
With dbs.OpenRecordset(strSQL)
HasRecordsP = Not .EOF
.Close
End With
Set dbs = Nothing
End Function
Function selsql(intlvl As Integer) As String
selsql = "select p" & intlvl & ".personid from "
End Function
Function frmsql(intlvl As Integer) As String
frmsql = " inner join people as p" & intlvl & " on p" & intlvl - 1 & ".personid = p" & intlvl & ".supervisor "
End Function
Here, the BuildQuerySQL
function may be supplied with the PersonID
corresponding to a Supervisor
and the function will return 'recursive' SQL code for an appropriate query to obtain the PersonID
for all subordinates of the supervisor.
Such function may therefore be evaluated to construct a saved query, e.g. for a supervisor with PersonID = 5
, creating a query called Subordinates
:
Sub test()
CurrentDb.CreateQueryDef "Subordinates", BuildQuerySQL(5)
End Sub
Or the SQL may be evaluated to open a RecordSet of the results perhaps, depending on the requirements of your application.
Note that the function constructs a UNION
query, with each level of nesting unioned with the previous query.
Upvotes: 0