Reputation: 33

MS-Access SQL Self Join with multiple levels

I have a table "People" with a primary key "PersonID" and a field that is "Supervisor". The "Supervisor" field contains the foreign key of a "PersonID" to create a self join.

I would like to create an sql query that returns all people with "Me" (the PersonID that is logged into the database) as their supervisor, and anyone that has someone on that list labeled as their supervisor. Essentially I would like to list anyone below the supplied PersonID in the chain of command.

Upvotes: 2

Answers (4)

Josh Johner

Reputation: 33

After considering the options presented here I have decided that I am going about this the wrong way. I have added a field to the "People" table "PermissionsLevel" which is a lookup from another table with a simple "PermissionNumber" and "PermissionDescription". I then use a select case in the Form_load() event for the logged in user's permission level.

Select Case userPermissionLevel

    Case Creator
        'Queries everyone in the database

    Case Administrator    
        'Queries everyone in the "Department" they are a member of

    Case Supervisor
        'Queries all people WHERE supervisor = userID OR _
            supervisor IN (Select PersonID From People WHERE supervisor = userID)

    Case Custodian '(Person in charge of maintaining the HAZMAT Cabinet and SDS)
        'Queries WHERE supervisor = DLookup("Supervisor", "People", "PersonID = " & userID)

Upvotes: 0

this

Reputation: 1426

SQL is great for many things, but hierarchical data is one of bigger challenges. Some vendors has provided custom extensions to work around this (e.g. Oracle's CONNECT syntax or SQL Server's hierarchyid data type), but we probably want to keep this standard SQL¹.

What you have modeled is called "adjacency list" -- this is very simple and straightforward, and always consistent². But as you found out, this sucks for querying, especially for an unknown depth or for a subtree, rather than from the root node.

Therefore, we need to supplement this with an additional model. There are basically 3 other models that you should use in conjunction with the adjacency list model.

Nested sets
Materialized Path
Ancestry traversal closure

To study them in depth, we'll use this diagram:

For this discussion, we are also assuming this is a simple hierarchy, that there are no cycles.

Joe Celko's Nested Sets.

Basically, you store the "Left" and "Right" value of each node which indicates its position in the tree. The root node will always have 1 for "Left" and <count of nodes * 2> for "Right". This is easier to illustrate with a diagram:

Note that each node gets assigned a pair of number, one for "Left", and other for "Right". With that information, you can do some logical deductions. Finding all children becomes easy - you filter for values where the nodes' "Left" is greater than the target node's "Left" and where the same nodes' "Right" is smaller than the target node's "Right".

The biggest downside with the model is that a change to the hierarchy almost invariably requires updating the entire tree, which makes it very awful to maintain for a fast moving charts. If this is something you only update once a year, this might be acceptable.

The other issue with this model is that if there is a need for a multiple hierarchies, the nested set will not work without additional columns to track the separate hierarchy.

Materialized Path

You know how a filesystem path works, right? This is basically the same thing, except that we are storing this in the database³. For instance, a possible implementation of a materialized path might look like this:

ID  Name      Path
1   Alice     1/
2   Bob       1/2/
3   Christina 1/3/
4   Dwayne    1/4/
5   Erin      1/2/5/
6   Frank     1/2/6/
7   Georgia   1/2/7/
8   Harry     1/2/7/8/
9   Isabella  1/3/9/
10  Jake      1/3/10/
11  Kirby     1/3/10/11/
12  Lana      1/3/12/
13  Mike      1/4/13/
14  Norma     1/4/13/14/
15  Opus      1/4/15/
16  Rianna    1/4/16/

This is quite intuitive and can perform OK as long you write your SQL queries to use predicates like WHERE Path LIKE '1/4/*'. Engines will be able to use index on the path column. Note that if your queries involve querying a middle of the tree or from bottom up, that means index cannot be used and performance will suffer for it. But programming against a materialized path is pretty easy to understand. Updating a part of the tree won't propagate to unrelated nodes as the nested sets so that's also a plus in its favor.

The biggest downside is that to be indexable, the text has to be a short column. For Access database that puts a 255 character limit on your path field. Even worse, there is no good way to predict when you are about to hit the limit -- you could hit it because you have too deep tree, or because you have too wide tree (e.g. bigger numbers taking up too much spaces). For that reason, large trees might necessitate some hard-coded limit to avoid this situation.

Ancestry Traversal Closure

This model involves a separate table which is updated whenever the employee table is updated. Instead of only recording the immediate relationship, we enumerate all the ancestry between two nodes. To illustrate, this is how the table will look like:

Employee table:

ID  Name
1   Alice
2   Bob
3   Christina
4   Dwayne
5   Erin
6   Frank
7   Georgia
8   Harry
9   Isabella
10  Jake
11  Kirby
12  Lana
13  Mike
14  Norma
15  Opus
16  Rianna

Employee Ancestry Table:

Origin  Ancestor
1        1
2        1
2        2
3        1
3        3
4        1
4        4
5        1
5        2
5        5
6        1
6        2
6        6
7        1
7        2
7        7
8        1
8        2
8        7
8        8
9        1
9        3
9        9
10       1
10       3
10       10
11       1
11       3
11       10
11       11
12       1
12       3
12       12
13       1
13       4
14       1
14       4
14       13
14       14
15       1
15       4
15       15
16       1
16       4
16       16

As you see, we generate several rows worth of all possible relationship between two nodes. As a bonus because it's a table, we can make use of foreign key and cascade delete to help keep it consistent. We still have to manually manage the inserts & updates however. Because the table is also narrow, it makes it very easy to create query that can leverage index on the key, the origin and the ancestor to find the subtree, the children, the parent. This is the most flexible system at expense of extra complexity around the maintenance.

Maintaining the model

All 3 models discussed are basically denormalizing the data a bit in order to simplify the query and support an arbitrary depth search. A consequence of that is this necessitates us to manually manage the changes when the employee table is modified in some fashion.

The most simplest approach is simply to just write a VBA procedure that will truncate and re-build the entire chart using your preferred model. This can work very well when the chart is small or does not change often.

On the other end, you could consider using Data Macros on your employee table to perform the maintenance required to propagate the updates to the hierarchy. A caveat, though, if you use data macros, this makes it harder to port the data to another RDBMS system since none of those support data macros. (To be fair, the problem would still exist if you were porting from SQL Server's stored procedures/triggers to Oracle's stored procedure/triggers - those are very steeped in vendor's dialect that porting is a challenge). Using data macros or trigger + stored procedure mean that you can rely on the engine to maintain the hierarchy for you without any programming in the forms.

A common temptation is to use form's AfterUpdate event to maintain the changes and that would work.... unless someone update it outside the form. For that reason, I would actually prefer that we used a data macro rather than relying on everyone to always use the form.

Note that in all of this discussion, we should NOT discard the adjacency list model. As I commented earlier, this is the most normalized and consistent way to model the hierarchy. It is literally impossible to create a nonsensical hierarchy with it. For that reason alone, you should keep it as your "authoritative truth", which you can then build your model upon to aid the querying performance.

Another good reason to keep using the adjacency list model is regardless of which model you use above, they introduce either additional columns or additional tables that are not meant to be directly edited by users but are for purpose somewhat equivalent to a calculated field and thus should not be tinkered with. If the users are allowed to edit only the SupervisorID field, then it becomes easy to code your data macros/triggers/VBA procedure around that one field, and updating the "calculations" of the additional fields/table to ensure correctness for the queries depending on such models.

^{1. SQL Standard does describe a way to create a recursive query. However, the compliance for that particular feature seems to be poor. Furthermore, the performance may not be that great. (which is the case with SQL Server's particular implementation) The 3 models discussed are easily implemented in most of RDBMS and queries for querying the hierarchy can be easily written and ported. However, the implementation to automatically manage the changes to the hierarchy invariably requires vendor-specific dialect, using triggers or stored procedure which is not very portable.}

^{2. When I say consistent, I only mean that the model cannot create a nonsensical output. It's still possible to provide wrong data and make a weird hierarchy such as an employee's supervisor reporting to the employee, but not one that would give undefined results. However, it still is a hierarchy (even if it ends up as a cyclical graph). With other models, failing to maintain the derived data correctly means the queries will start returning undefined results.}

^{3. SQL Server's hierarchyid data type is in fact an implementation of this model.}

Upvotes: 3

Gustav

Reputation: 55841

As you probably will have a rather limited count, say six, of levels deep, you can use a query with subqueries with subqueries ... etc. Very simple.

For an unlimited number of levels, the fastest way I've found, is to create a lookup function which walks the tree for each record. This can output either the level of the record or a compound key build by the key of the record and all keys above.

As the lookup function will use the same recordset for every call, you can make it static, and (for JET) you can improve further by using Seek to locate the records.

Here's an example which will give you an idea:

Public Function RecursiveLookup(ByVal lngID As Long) As String

  Static dbs      As Database
  Static tbl      As TableDef
  Static rst      As Recordset

  Dim lngLevel    As Long
  Dim strAccount  As String

  If dbs Is Nothing Then
    ' For testing only.
    ' Replace with OpenDatabase of backend database file.
    Set dbs = CurrentDb()
    Set tbl = dbs.TableDefs("tblAccount")
    Set rst = dbs.OpenRecordset(tbl.Name, dbOpenTable)
  End If

  With rst
    .Index = "PrimaryKey"
    While lngID > 0
      .Seek "=", lngID
      If Not .NoMatch Then
        lngLevel = lngLevel + 1
        lngID = !MasterAccountFK.Value
        If lngID > 0 Then
          strAccount = str(!AccountID) & strAccount
        End If
      Else
        lngID = 0
      End If
    Wend
    ' Leave recordset open.
    ' .Close
  End With

'  Don't terminate static objects.
'  Set rst = Nothing
'  Set tbl = Nothing
'  Set dbs = Nothing

'  Alternative expression for returning the level.
'  (Adjust vartype of return value of function.) '  RecursiveLookup = lngLevel ' As Long
  RecursiveLookup = strAccount

End Function

This assumes a table with a primary key ID and a foreign (master) key pointing to the parent record - and a top level record (not used) with a visible key (AccountID) of 0.

Now your tree will be nicely shown almost instantaneously using a query like this where Account will be the visible compound key:

  SELECT
    *, RecursiveLookup([ID]) AS Account
  FROM
    tblAccount
  WHERE
    (AccountID > 0)
  ORDER BY
    RecursiveLookup([ID]);

If you wish to use this to add records to another table, you should not make an SQL call for each, as this is very slow, but first open a recordset, then use AddNew-Update to append each record and, finally, close this recordset.

Upvotes: 1

Lee Mac

Reputation: 16015

Consider the following set of functions:

Function BuildQuerySQL(lngsid As Long) As String
    Dim intlvl As Integer
    Dim strsel As String: strsel = selsql(intlvl)
    Dim strfrm As String: strfrm = "people as p0 "
    Dim strwhr As String: strwhr = "where p0.supervisor = " & lngsid

    While HasRecordsP(strsel & strfrm & strwhr)
        intlvl = intlvl + 1
        BuildQuerySQL = BuildQuerySQL & " union " & strsel & strfrm & strwhr
        strsel = selsql(intlvl)
        If intlvl > 1 Then
            strfrm = "(" & strfrm & ")" & frmsql(intlvl)
        Else
            strfrm = strfrm & frmsql(intlvl)
        End If
    Wend
    BuildQuerySQL = Mid(BuildQuerySQL, 8)
End Function

Function HasRecordsP(strSQL As String) As Boolean
    Dim dbs As DAO.Database
    Set dbs = CurrentDb
    With dbs.OpenRecordset(strSQL)
        HasRecordsP = Not .EOF
        .Close
    End With
    Set dbs = Nothing
End Function

Function selsql(intlvl As Integer) As String
    selsql = "select p" & intlvl & ".personid from "
End Function

Function frmsql(intlvl As Integer) As String
    frmsql = " inner join people as p" & intlvl & " on p" & intlvl - 1 & ".personid = p" & intlvl & ".supervisor "
End Function

Here, the BuildQuerySQL function may be supplied with the PersonID corresponding to a Supervisor and the function will return 'recursive' SQL code for an appropriate query to obtain the PersonID for all subordinates of the supervisor.

Such function may therefore be evaluated to construct a saved query, e.g. for a supervisor with PersonID = 5, creating a query called Subordinates:

Sub test()
    CurrentDb.CreateQueryDef "Subordinates", BuildQuerySQL(5)
End Sub

Or the SQL may be evaluated to open a RecordSet of the results perhaps, depending on the requirements of your application.

Note that the function constructs a UNION query, with each level of nesting unioned with the previous query.

Upvotes: 0

MS-Access SQL Self Join with multiple levels

Answers (4)

Related Questions