Reputation: 1049
I'm writing a data tree structure that is combined from a Tree and a TreeNode. Tree will contain the root and the top level actions on the data. I'm using a UI library to present the tree in a windows form where I can bind the tree to the TreeView.
I will need to save this tree and nodes in the DB. What will be the best way to save the tree and to get the following features:
I had 2 ideas. The first is to serialize the data into a one liner in a table. The second is to save in tables but then, when moving to data entities I will loose the row states on the table on changed nodes.
Any ideas?
Upvotes: 72
Views: 79138
Reputation: 147
Hello, I just got a handle on this for a project I'm working on and figured I'd share my write-up Hope this helps. Let's get started with some prereqs
This is essentially the closure table
solution mentioned above Using recursive calls. Thanks for those slides they are very useful I wish i saw them before this write up :)
these are functions that call themselves ie
function factorial(n) {
if (n = 0) return 1; //base case
return n * factorial(n - 1); // recursive call
}
This is pretty cool luckily pgsql has recursive functions too but it can be a bit much. I prefer functional stuff cte with pgsql
WITH RECURSIVE t(n) AS (
VALUES (1) -- nonrecusive term
UNION ALL
SELECT n+1 FROM t WHERE n < 100 -- recusive term
--continues until union adds nothing
)
SELECT sum(n) FROM t;
The general form of a recursive WITH query is always a non-recursive term, then UNION (or UNION ALL), then a recursive term, where only the recursive term can contain a reference to the query's own output. Such a query is executed as follows:
Recursive Query Evaluation
Evaluate the non-recursive term. For UNION (but not UNION ALL), discard duplicate rows. Include all remaining rows in the result of the recursive query, and also place them in a temporary working table.
So long as the working table is not empty, repeat these steps:
a. Evaluate the recursive term, substituting the current contents of the working table for the recursive self-reference. For UNION (but not UNION ALL), discard duplicate rows and rows that duplicate any previous result row. Include all remaining rows in the result of the recursive query, and also place them in a temporary intermediate table.
b. Replace the contents of the working table with the contents of the intermediate table, then empty the intermediate table.
to do something like factorial in sql you need to do something more like this so post
ALTER FUNCTION dbo.fnGetFactorial (@num int)
RETURNS INT
AS
BEGIN
DECLARE @n int
IF @num <= 1 SET @n = 1
ELSE SET @n = @num * dbo.fnGetFactorial(@num - 1)
RETURN @n
END
GO
The import thing to note is that a tree is a subset of a graph, This can be simply enforced by the relationship each node has only one parent.
I think it will be easiest to work it out a little more theoretically before we move on to the sql
The simple way of represent a graph relation without data duplication is by separating the nodes(id, data)
from the edges.
We can then restrict the edges(parent_id, child_id)
table to enforce our constraint. be mandating that parent_id,child_id
as well as just child id be unique
create table nodes (
id uuid default uuid_generate_v4() not null unique ,
name varchar(255) not null,
json json default '{}'::json not null,
remarks varchar(255),
);
create table edges (
id uuid default uuid_generate_v4() not null,
parent_id uuid not null,
child_id uuid not null,
meta json default '{}'::json,
constraint group_group_id_key
primary key (id),
constraint group_group_unique_combo
unique (parent_id, child_id),
constraint group_group_unique_child
unique (child_id),
foreign key (parent_id) references nodes
on update cascade on delete cascade,
foreign key (child_id) references nodes
on update cascade on delete cascade
);
Note that theoretical this can all be done with only one table by simply putting the parent_id in the nodes table and then
CREATE VIEW v_edges as (SELECT id as child_id, parent_id FROM nodes)
but for the proposal of flexibility and so that we can incorporate other graph structures to this framework I will use the common many-to-many relationship structure. This will ideally allow this research to be expanded into other graph algorithms.
Let's start out with a sample data structure
INSERT (id, my_data) VALUES ('alpha', 'my big data') INTO nodes
INSERT (id, my_data) VALUES ('bravo', 'my big data') INTO nodes
INSERT (id, my_data) VALUES ('charly', 'my big data') INTO nodes
INSERT (id, my_data) VALUES ('berry', 'my big data') INTO nodes
INSERT (id, my_data) VALUES ('zeta', 'my big data') INTO nodes
INSERT (id, my_data) VALUES ('yank', 'my big data') INTO nodes
INSERT (parent_id, child_id) VALUES ('alpha', 'bravo') INTO edges
INSERT (parent_id, child_id) VALUES ('alpha', 'berry') INTO edges
INSERT (parent_id, child_id) VALUES ('bravo', 'charly') INTO edges
INSERT (parent_id, child_id) VALUES ('yank', 'zeta') INTO edges
-- rank0 Alpha Yank
-- rank1 Bravo Berry Zeta
-- rank2 Charly
Note the interesting properties of a tree (number of edges e) =( number of nodes n)-1
each child has exactly one parent.
We can then simplify the equations
let n = node
let p = parent
let c = child
let ns = nodes = groups
let es = edges = group_group // because this is a relationship of a group entity to another group entity
So now what sort of questions will we ask.
"Given an arbitrary set of groups 's' what is the coverage of the graph assuming nodes inherit their children?"
This is a tricky question, it requires us to traverse the graph and find all children of each node in s
This continues off of this stack overflow post
-- some DBMS (e.g. Postgres) require the word "recursive"
-- some others (Oracle, SQL-Server) require omitting the "recursive"
-- and some (e.g. SQLite) don't bother, i.e. they accept both
-- drop view v_group_descendant;
create view v_group_descendant as
with recursive descendants -- name for accumulating table
(parent_id, descendant_id, lvl) -- output columns
as
( select parent_id, child_id, 1
from group_group -- starting point, we start with each base group
union all
select d.parent_id, s.child_id, d.lvl + 1
from descendants d -- get the n-1 th level of descendants/ children
join group_group s -- and join it to find the nth level
on d.descendant_id = s.parent_id -- the trick is that the output of this query becomes the input
-- Im not sure when it stops but probably when there is no change
)
select * from descendants;
comment on view v_group_descendant is 'This aggregates the children of each group RECURSIVELY WOO ALL THE WAY DOWN THE TREE :)';
after we have this view we can join with our nodes/groups to get out data back i will not provide these samples for every single step for the most part we will just work with ids.
select d.*, g1.group_name as parent, g2.group_name as decendent --then we join it with groups to add names
from v_group_descendant d, groups g1, groups g2
WHERE g1.id = d.parent_id and g2.id = d.descendant_id
order by parent_id, lvl, descendant_id;
sample output
+------------------------------------+------------------------------------+---+----------+---------+
|parent_id |descendant_id |lvl|parent |decendent|
+------------------------------------+------------------------------------+---+----------+---------+
|3ef7050f-2f90-444a-a20d-c5cbac91c978|6c758087-a158-43ff-92d6-9f922699f319|1 |bravo |charly |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|3ef7050f-2f90-444a-a20d-c5cbac91c978|1 |alpha |bravo |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|7135b0c6-d59c-4c27-9617-ddcf3bc79419|1 |alpha |berry |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|6c758087-a158-43ff-92d6-9f922699f319|2 |alpha |charly |
|42529e8a-75b0-4242-a51a-ac60a0e48868|44758087-a158-43ff-92d6-9f922699f319|1 |yank |zeta |
+------------------------------------+------------------------------------+---+----------+---------+
Note that this is just the minimal node descendant relationship and has actual lost all nodes with 0 children such as charly.
In order to resolve this we need to add all nodes back which don't appear in the descendants list
create view v_group_descendant_all as (
select * from v_group_descendant gd
UNION ALL
select null::uuid as parent_id,id as descendant_id, 0 as lvl from groups g
where not exists (select * from v_group_descendant gd where gd.descendant_id = g.id )
);
comment on view v_group_descendant is 'complete list of descendants including rank 0 root nodes descendant - parent relationship is duplicated for all levels / ranks';
preview
+------------------------------------+------------------------------------+---+----------+---------+
|parent_id |descendant_id |lvl|parent |decendent|
+------------------------------------+------------------------------------+---+----------+---------+
|3ef7050f-2f90-444a-a20d-c5cbac91c978|6c758087-a158-43ff-92d6-9f922699f319|1 |bravo |charly |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|3ef7050f-2f90-444a-a20d-c5cbac91c978|1 |alpha |bravo |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|7135b0c6-d59c-4c27-9617-ddcf3bc79419|1 |alpha |berry |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|6c758087-a158-43ff-92d6-9f922699f319|2 |alpha |charly |
|42529e8a-75b0-4242-a51a-ac60a0e48868|44758087-a158-43ff-92d6-9f922699f319|1 |yank |zeta |
|null |c1529e8a-75b0-4242-a51a-ac60a0e48868|0 |null |alpha |
|null |42529e8a-75b0-4242-a51a-ac60a0e48868|0 |null |yank |
+------------------------------------+------------------------------------+---+----------+---------+
Lets say for example we are getting our set s of groups bases on a users(id , data)
table with a user_group(user_id, group_id)
relation
We can then join this to another table removing duplicates because our set s
of user_group relations may cause
duplicates if a users is say assigned to both alpha assigned charly
+------+--------+
| user | group |
+------+--------+
| jane | alpha |
| jane | charly |
| kier | yank |
| kier | bravo |
+------+--------+
--drop view v_user_group_recursive;
CREATE VIEW v_user_group_recursive AS (
SELECT DISTINCT dd.descendant_id AS group_id, ug.user_id
FROM v_group_descendant_all dd , user_group ug
WHERE (ug.group_id = dd.descendant_id
OR ug.group_id = dd.parent_id) -- should gic
);
SELECT * FROM v_user_group_recursive;
+------+--------+
| user | group |
+------+--------+
| jane | alpha |
| jane | bravo |
| jane | berry |
| jane | charly |
-- | jane | charly | Removed by DISTINCT
| kier | yank |
| kier | zeta |
| kier | bravo |
| kier | charly |
+------+--------+
If we want we can now group by node and join we can do somthing k like the fallowing
CREATE VIEW v_user_groups_recursive AS (
SELECT user_id, json_agg(json_build_object('id', id,'parent_id',parent_id, 'group_name', group_name, 'org_id', org_id, 'json', json, 'remarks', remarks)) as groups
FROM v_user_group_recursive ug, v_groups_parent g
WHERE ug.group_id = g.id GROUP BY user_id
);
comment on view v_user_group_recursive is 'This aggregates the groups for each user recursively ';
+------+-------------------------------+
| user | groups |
+------+-------------------------------+
| jane | [alpha, bravo, berry, charly] |
| kier | [yank, zeta, bravo, charly] |
+------+-------------------------------+
This is awesome we have answered the question. We now can simply ask which groups this use inherits
SELECT * from v_user_groups_recursive where user_id = 'kier
And further we could use somthing like jstree.com to display our structure
async function getProjectTree(user_id) {
let res = await table.query(format('SELECT * from v_user_groups_recursive ug WHERE ug.user_id = %L', user_id));
if (res.success) {
let rows = res.data[0].groups.map(r => {
return {
id: r.id, // required
parent: r.parent_id==null?'#':r.parent_id,// required
text: r.group_name,// node text
icon: 'P', // string for custom
state: {
opened: true, // is the node open
disabled: false, // is the node disabled
selected: false, // is the node selected
},
li_attr: {}, // attributes for the generated LI node
a_attr: {} // attributes for the generated A node
}
})
return {success: true, data: rows, msg: 'Got all projects'}
} else return res;
}
<div id="v_project_tree" class="row col-10 mx-auto" style="height: 25vh"></div>
<script>
function buildTree() {
bs.sendJson('get', "/api/projects/getProjectTree").then(res => {
bs.resNotify(res);
if (!res.success) {
//:(
console.error(':(');
return
}
console.log(res.data);
$('#v_project_tree').jstree({
'core': {
'data': res.data
}
});
})
}
window.addEventListener('load', buildTree);
</script>
Upvotes: 4
Reputation: 101
As this is the top answer when asking "sql trees" in a google search, I will try to update this from the perspective of today (december 2018).
Most answers imply that using an adjacency list is both simple and slow and therefore recommend other methods.
Since version 8 (published april 2018) MySQL supports recursive common table expressions (CTE). MySQL is a bit late to the show but this opens up a new option.
There is a tutorial here that explains the use of recursive queries to manage an adjacency list.
As the recursion now runs completely within the database engine, it is way much faster than in the past (when it had to run in the script engine).
The blog here gives some measurements (which are both biased and for postgres instead of MySQL) but nevertheless it shows that adjacency lists do not have to be slow.
So my conclusion today is:
Upvotes: 5
Reputation: 3433
If you are using PostgreSQL you can use ltree
, a package in the contrib extension (comes by default) which implements the tree data structure.
From the docs:
CREATE TABLE test (path ltree);
INSERT INTO test VALUES ('Top');
INSERT INTO test VALUES ('Top.Science');
INSERT INTO test VALUES ('Top.Science.Astronomy');
INSERT INTO test VALUES ('Top.Science.Astronomy.Astrophysics');
INSERT INTO test VALUES ('Top.Science.Astronomy.Cosmology');
INSERT INTO test VALUES ('Top.Hobbies');
INSERT INTO test VALUES ('Top.Hobbies.Amateurs_Astronomy');
INSERT INTO test VALUES ('Top.Collections');
INSERT INTO test VALUES ('Top.Collections.Pictures');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Stars');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Galaxies');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Astronauts');
CREATE INDEX path_gist_idx ON test USING GIST (path);
CREATE INDEX path_idx ON test USING BTREE (path);
You can do queries like:
ltreetest=> SELECT path FROM test WHERE path <@ 'Top.Science';
path
------------------------------------
Top.Science
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
(4 rows)
Upvotes: 10
Reputation: 425261
The easiest implementation is adjacency list structure:
id parent_id data
However, some databases, particularly MySQL
, have some issues in handling this model, because it requires an ability to run recursive queries which MySQL
lacks.
Another model is nested sets:
id lft rgt data
where lft
and rgt
are arbitrary values that define the hierarchy (any child's lft
, rgt
should be within any parent's lft
, rgt
)
This does not require recursive queries, but it slower and harder to maintain.
However, in MySQL
this can be improved using SPATIAL
abitilies.
See these articles in my blog:
for more detailed explanations.
Upvotes: 44
Reputation: 1183
I've bookmarked this slidshare about SQL-Antipatterns, which discusses several alternatives: http://www.slideshare.net/billkarwin/sql-antipatterns-strike-back?src=embed
The recommendation from there is to use a Closure Table (it's explained in the slides).
Here is the summary (slide 77):
| Query Child | Query Subtree | Modify Tree | Ref. Integrity
Adjacency List | Easy | Hard | Easy | Yes
Path Enumeration | Easy | Easy | Hard | No
Nested Sets | Hard | Easy | Hard | No
Closure Table | Easy | Easy | Easy | Yes
Upvotes: 71
Reputation: 29922
I'm suprised that nobody mentioned the materialized path solution, which is probably the fastest way of working with trees in standard SQL.
In this approach, every node in the tree has a column path, where the full path from the root to the node is stored. This involves very simple and fast queries.
Have a look at the example table node:
+---------+-------+
| node_id | path |
+---------+-------+
| 0 | |
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 1.4 |
| 5 | 2.5 |
| 6 | 2.6 |
| 7 | 2.6.7 |
| 8 | 2.6.8 |
| 9 | 2.6.9 |
+---------+-------+
In order to get the children of node x, you can write the following query:
SELECT * FROM node WHERE path LIKE CONCAT((SELECT path FROM node WHERE node_id = x), '.%')
Keep in mind, that the column path should be indexed, in order to perform fast with the LIKE clause.
Upvotes: 13
Reputation: 837986
It depends on how you will be querying and updating the data. If you store all the data in one row, it's basically a single unit that you can't query into or partially update without rewriting all the data.
If you want to store each element as a row, you should first read Managing Hierarchical Data in MySQL (MySQL specific, but the advice holds for many other databases too).
If you're only ever accessing an entire tree, the adjacency list model makes it difficult to retrieve all nodes under the root without using a recursive query. If you add an extra column that links back to the head then you can do SELECT * WHERE head_id = @id
and get the whole tree in one non-recursive query, but it denormalizes the database.
Some databases have custom extensions that make storing and retrieving heirarchical data easier, for example Oracle has CONNECT BY.
Upvotes: 7
Reputation: 12867
The best way, I think indeed is to give each node an id and a parent_id, where the parent id is the id of the parent node. This has a couple of benefits
Upvotes: 1
Reputation: 39548
Something like table "nodes" where each node row contains parent id (in addition to the ordinary node data). For root, the parent is NULL.
Of course, this makes finding children a bit more time consuming, but this way the actual database will be quite simple.
Upvotes: 0