Reputation: 1763

How to efficiently query a directed acyclic graph

I'm using mysql for one on my web applications. The application table contains a supervisor table and an employee table. employee table contains information about each employee. The supervisor table contains two columns as follows.

supervisor_id -> which is employee id of the supervisor
subordinate_id -> which is the employee id of the subordinate.

Each subordinate can have multiple supervisors and one supervisors subordinates can be a supervisor of of some other employee. So the table records can be ass follows.

supervisor_id | subordinate_id
1             | 2
1             | 3
2             | 4
4             | 5
3             | 6
3             | 4

In the above example there is a supervisor chain. Supervisor 1 has 2, 3, 4, 5 and 6 as his subordinates. Supervisor 2 has 4, 5 as subordinates. And also it can have multiple supervisors for a subordinate.

When I querying for all subordinates for supervisor 2 currently I use a query like following.

public function getSubordinate($id) {
 $query = "SELECT * FROM supervisor WHERE subordinate_id = $id";
 // get results and return
}

So what I'm currently do is first send the id as 2 to get its immediate subordinates. Then for each and every resulting subordinates I run the query again and again to get the full subordinate chain.

This is okay with small set of data. But this supervisor table will have thousands of data so I have to do thousands of queries to find the supervisor chain and it takes times to give results.

Since subordinates can have multiple supervisors the nested set will not be a exact answer for this.

I went through this solution also . http://www.codeproject.com/Articles/22824/A-Model-to-Represent-Directed-Acyclic-Graphs-DAG-o

But when I use this method it will have millions of data with that table. and it is inefficient.

My problem is there any efficient way to do this. Is there any problem with my table structure which prevent me to do this kind of query efficiently.

Upvotes: 8

Answers (2)

MountainX

Reputation: 6347

All of the major database applications (including MySQL and MariaDB) now support recursive queries using Common Table Expressions. This was introduced in MySQL version 8.0 and MariaDB version 10.2.2. PostgreSQL had support even earlier. Oracle has it, and SQL Server added it with the 2005 version. In fact, a quick search suggests that Sqlite supports Common Table Expressions too.

Therefore, the answer you may be looking for is to use Common Table Expressions and recursive queries. Somes reasons why that is considered a better solution compared to "A Model to Represent Directed Acyclic Graphs (DAG) on SQL Databases" are explained here:

Encoding and Querying Graphs in the Relational Model
https://drtom.ch/posts/2012/02/11/Encoding_and_Querying_Graphs_in_the_Relational_Model/

(You can ignore the part where he says, "it won't run in particular on MySQL or sqlite3, which simply do not support CTEs." As I mentioned, that's no longer the case.)

As you noted in your question, "when I use this method it will have millions of data with that table." That alone might not be so bad if you were trading space for efficiency all around, but as Dr. Tom's post explains in one example:

The operation of deleting or inserting the red arc requires effort in θ(n^2) too.

That's n-squared effort for those operations; you gain query efficiency, but at the cost of space inefficiency and insertion/deletion inefficiency. He further goes on to point out that

practically all large real world networks are sparse. They have much fewer edges than there would be possible, i.e. m«n^2.

In fairness, the Code Project article by Kemal Erdogan that you linked is from 2008; CTE's were not universally available then. Furthermore, Erdogan made an informed choice in regard to the trade-off's as he explained here:

The solution that I have is based on a recursion [too]. However, instead of deferring the recursion until query time, I do the recursion at the insertion time, assuming that the graph is actually more queried than modified (which is true for all the cases that I have faced so far).

If, after reading Dr. Tom's article, you ultimately prefer Erdogan's trade-offs, you may be able to limit the other inefficiencies by taking a look at Laravel's implementation here:

GitHub - telkins/laravel-dag-manager: A SQL-based Directed Acyclic Graph (DAG) solution for Laravel. https://github.com/telkins/laravel-dag-manager

In particular, look at Max Hops and implement something like that in your own solution.

This is in the Laravel config file:

/*
|--------------------------------------------------------------------------
| Max Hops
|--------------------------------------------------------------------------
|
| This value represents the maximum number of hops that are allowed where
| hops "[i]ndicates how many vertex hops are necessary for the path; it is
| zero for direct edges".
|
| The more hops that are allowed (and used), then the more DAG edges will
| be created.  This will have an increasing impact on performance, space,
| and memory.  Whether or not it's negligible, noticeable, or impactful
| depends on a variety of factors.
*/

'max_hops' => 5,

Disclaimer: I am only now researching this for myself. I do not yet have the benefit of experience with any of these solutions.

Upvotes: 2

Bingo

Reputation: 64

You are saying acyclic graph, so maybe I'm way off here - but it sounds at the same time like you need something for a normal supervisors and employee hierarchy ? so can it be done with a tree structure ?

I am not sure but it sounds like you only need a tree structure ?? and i think the easiest way to pull out who is over one person is to store all the names in one table, and use two field to update the relationship between the people. The fields would be left and right.

                              _______  
                           1 | peter | 20
                              _______
             ______                        ______
          2 | paul | 17                18 | john | 19
             ______                        ______
    _____            _______
 3 |judas | 4      5 | maria | 16
    _____            _______


               _____             ________
            6 |seth  | 7      8 | abraham | 15
               _____             _______

                                ______          
                              9 |bill | 14
                                 _____

                          _____                _______
                      10 |kenny | 11      12 | moses | 13
                          _____                _______

Who is the bosses of Moses ? everybody with a higher right and lover left gives you Bill, Abraham, Maria, Paul and Peter :-) This takes no time at all in a database to pull out. If this is interesting I can update this answer with details on how to do this.

 table  left   right

 peter  1      20
 paul   2      7
 judas  3      4
 maria  5      16
 seth   6      7
 ... etc


 select * from people where left < 12 and right > 13

result:

 bill     9     14
 abraham  8     15
 maria    4     16
 paul     2     17
 peter    1     20

Upvotes: 0

How to efficiently query a directed acyclic graph

Answers (2)

Related Questions