Reputation: 13465
I have a table like
create table site
(
site_Id int(5),
parent_Id int(5),
site_desc varchar2(100)
);
Significance of the fields:
The requirement is that if I have a site_id as an input, and I need all the ids tagged below the site. For Example:
A
/ \
B C
/ | \ /\
D E F G H
/\
I J
All the nodes are the site_Id.
The table contains data like this:
Site_id | Parent_ID | site_desc
_________|____________|___________
A | -1 |
B | A |
C | A |
D | B |
E | B |
F | B |
I | D |
J | D |
......
A is the parent of B and C and so on.
If B is the input given then the query need to fetch D, E, I, F, J
It is currently achieved through multiple queries in a loop, but I was thinking to achieve this in a minimum number of queries.
What I am currently doing is::
down vote
The algorithm goes like this :
I need the best optimized technique within my data model constraint.
Upvotes: 15
Views: 9269
Reputation: 562881
Unfortunately, if you can't change the data model, and you're using MySQL, you're stuck in a situation where you need recursive queries and you're using a DBMS that doesn't support recursive queries.
Quassnoi wrote an interesting series of blog articles, showing techniques for querying hierarchical data. His solutions are quite clever, but very complex. http://explainextended.com/2009/03/17/hierarchical-queries-in-mysql/
PostgreSQL is another open-source RDBMS, which does support recursive queries, so you could fetch a whole tree stored in the way you show. But if you can't change the data model, I'd assume you can't switch to a different RDBMS.
There are several alternative data models that make it much easier to fetch arbitrarily-deep trees:
I cover these in my presentation Models for Hierarchical Data with SQL and PHP, and in my book SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
Finally, there's another solution that I've seen used in the code for Slashdot, for their comments hierarchies: They store "parent_id" like in Adjacency List, but they also store a "root_id" column. Every member of a given tree has the same value for root_id, which is the highest ancestor node in its tree. Then it's easy to fetch a whole tree in one query:
SELECT * FROM site WHERE root_id = 123;
Then your application fetches all the nodes back from the database into an array, and you have to write the code to loop over this array, inserting the nodes into a tree data structure in memory. This is a good solution if you have many separate trees, and each tree has relatively few entries. It's good for Slashdot's case.
Upvotes: 10
Reputation: 1867
i also asked myself how to query relationships recursively, and my brain generated this solution (:
SELECT * FROM
(
SELECT t2.* FROM table t1, table t2 where t2.parent = t1.id OR t2.parent 0 GROUP BY t2.id, t2.parent
) as all_relations
WHERE all_relations.parent >= '_the_id_'
# if you dont want a subtree use only the inner select
I am not 100% sure, but i think as long as the id is auto incremented and a child never has a smaller id as his parent (that should be the normal case), then this could be a solution?
Upvotes: 1
Reputation: 42184
Based on your comments here I am assuming you are unwilling to change the existing data model because hundreds of applications are using that (and will break if you replace it with something else).
The root of the problem is that for any site, we only know it's direct parent, so we need to lookup the parent of that parent recursively until we found the root site.
If you can get away with a limit to the depth / level to which sites can be nested, you can write one great query that does all the work for you and probably isn't even that slow to boot. Most overhead from firing queries comes from setting up the connection, network bandwidth etc. MySQL can be very quick.
Firing multiple queries multiplies all the overhead, so we don't want that. Doing a SELECT * and then computing in the application logic means you will fetch all data every time, maximizing network overhead, so we don't want that.
If a limit to the depth of the tree is acceptable, you can combine multiple queries into one huge query that does all the work and returns the exact resultset you need. As an example I used your data, but with A, B, C etc replaced with 1, 2, 3 (as your columns are int).
To get all direct children of the root node (with site_id = 1) do this:
select site_id from site where parent_id = 1
To get the grandchildren of the root node, do this:
select grandchild.site_id
from site grandchild, site child
where grandchild.parent_id = child.site_id
and child.parent_id = 1
To get the great-grandchildren of the root node, do this:
select greatgrandchild.site_id
from site greatgrandchild, site grandchild, site child
where greatgrandchild.parent_id = grandchild.site_id
and grandchild.parent_id = child.site_id
and child.parent_id = 1
To get all descendants of the root node, just combine the above queries into one huge query, like this:
select site_id
from site
where site_id in (
select site_id
from site
where parent_id = 1
)
or site_id in (
select grandchild.site_id
from site grandchild, site child
where grandchild.parent_id = child.site_id
and child.parent_id = 1
)
or site_id in (
select greatgrandchild.site_id
from site greatgrandchild, site grandchild, site child
where greatgrandchild.parent_id = grandchild.site_id
and grandchild.parent_id = child.site_id
and child.parent_id = 1
)
I think you see how this is working. For each extra level, create a query that finds the nodes which are that many levels away from the site you are searching descendants for and add that query to the super-query with an extra 'or site_id in ()'...
Now as you can see, just for three levels, this already becomes a big query. If you need to support, say, 10 levels, this query will become huge and all the OR's and IN's in it will slow it down... However, it still will probably be faster then just getting everything or using multiple queries. If you need to support an arbitrary amount of possible levels than this query cannot help you. It would have to become infinitely large. In that case all that remains is to use a better way...
That said, and before you copy paste this and start coding, there is a way that avoids such huge queries, supporting arbitrary depths and without breaking backward compatibility. It does require a change to the data model but it's a small one that won't hurt the other programs using this data model. There is in short...
A BETTER WAY
Add an extra column parent_paths, using something like ravnur mentioned in his answer to encode the full path from each node all the way up to the root
Fill that column dynamically using triggers on insert, update and delete. You are now maintaining redundant data. It won't hurt other programs but can give a significant performance benefit for yours. Make sure your triggers are bullet-proof though (that's the hardest part probably) as the data in the extra column should always be in synch with the regular data in the table
Use a short and sweet query like the one ravnur showed that looks for the occurance of the site_id at any place in the parent_paths column to directly get all descendants of the site with that site_id without any recursion.
Upvotes: 2
Reputation: 1547
You might want to have a look at the closure table pattern. I found this site informative. As far as I have seen, ther are also several StackOverflow questions about this concept, e.g., here.
Upvotes: 3
Reputation: 1240
You can create a stored procedure for this.
Here is my implementation in mysql
DROP PROCEDURE IF EXISTS SearchTree;
DELIMITER go
CREATE PROCEDURE SearchTree( IN root CHAR(1) )
BEGIN
DECLARE rows SMALLINT DEFAULT 0;
DROP TABLE IF EXISTS reached;
CREATE TABLE reached (
site_Id CHAR(1) PRIMARY KEY
) ENGINE=HEAP;
INSERT INTO reached VALUES (root);
SET rows = ROW_COUNT();
WHILE rows > 0 DO
INSERT IGNORE INTO reached
SELECT DISTINCT s.site_Id
FROM site AS s
INNER JOIN reached AS r ON s.parent_Id = r.site_Id;
SET rows = ROW_COUNT();
DELETE FROM reached
WHERE site_Id = root;
END WHILE;
SELECT * FROM reached;
DROP TABLE reached;
END;
go
DELIMITER ;
CALL SearchTree('B');
It returns the expected result.
Upvotes: 2
Reputation: 23135
Yesterday, I had answered this question which is exactly related to your problem you've described: Out of a given adjacency list, you want to get all child nodes of a particular parent - and perhaps in a one-dimensional array that you can easily iterate over.
You can do this using only one call to the database, but there's sort of a catch: you have to return all rows from the table. MySQL doesn't support recursive queries, so instead, you essentially have to do the SELECT
ing in the application code.
I'm simply going to reiterate my answer that I linked to above, but basically if you return a result-set (perhaps from PDOStatement->fetchAll(PDO::FETCH_ASSOC)
or other methods) in the format of something like:
Array
(
[0] => Array
(
[site_id] => A
[parent_id] => -1
[site_desc] => testtext
)
[1] => Array
(
[site_id] => B
[parent_id] => A
[site_desc] => testtext
)
[2] => Array
(
[site_id] => C
[parent_id] => A
[site_desc] => testtext
)
[3] => Array
(
[site_id] => D
[parent_id] => B
[site_desc] => testtext
)
[4] => Array
(
[site_id] => E
[parent_id] => B
[site_desc] => testtext
)
[5] => Array
(
[site_id] => F
[parent_id] => B
[site_desc] => testtext
)
[6] => Array
(
[site_id] => I
[parent_id] => D
[site_desc] => testtext
)
[7] => Array
(
[site_id] => J
[parent_id] => D
[site_desc] => testtext
)
)
You can retrieve all children/grandchildren/greatgrandchildren/so-on of any site_id
(provided you know the id) using this recursive function:
function fetch_recursive($src_arr, $id, $parentfound = false, $cats = array())
{
foreach($src_arr as $row)
{
if((!$parentfound && $row['site_id'] == $id) || $row['parent_id'] == $id)
{
$rowdata = array();
foreach($row as $k => $v)
$rowdata[$k] = $v;
$cats[] = $rowdata;
if($row['parent_id'] == $id)
$cats = array_merge($cats, fetch_recursive($src_arr, $row['site_id'], true));
}
}
return $cats;
}
For example, say you wanted to retrieve all children of site_id
D
, you would use the function like so:
$nodelist = fetch_recursive($pdostmt->fetchAll(PDO::FETCH_ASSOC), 'D');
print_r($nodelist);
Would output:
[0] => Array
(
[site_id] => D
[parent_id] => B
[site_desc] => testtext
)
[1] => Array
(
[site_id] => I
[parent_id] => D
[site_desc] => testtext
)
[2] => Array
(
[site_id] => J
[parent_id] => D
[site_desc] => testtext
)
Notice we retain the information of the parent, along with its children, grandchildren, and so on (however deep the nesting goes).
Upvotes: 8
Reputation: 57428
Others have already proposed how to do this with slight modifications to the table structure.
If you do not want to modify the structure (even if this would be best), then you can do like this:
It can usually be safely assumed that once assigned, IDs do not change; if the IDs do not get shuffled around, i.e., node C is not moved under node B, then it will be true that child nodes have always higher IDs than their parents, and the sorting above will guarantee that all parents gets fetched before their children.
So these are the hypotheses:
- we prefer not to change the table layout
- we never change the IDs once assigned
- we never reorder the tree, moving IDs around
Therefore, it becomes possible to create the tree in memory (and even reduce the query itself adding a WHERE Site_ID >= B).
The first node to come through will be B's and will be put into the tree.
All subsequent nodes may be stored in their Parent_ID-th node, which has surely been loaded before.
This would go quite well in Python (you directly modify the parent node).
The request "Get all descendants of B" may be answered in PHP like this:
$nodes = array( $parent_id );
$cursor = SQLQuery("SELECT * FROM site WHERE Site_ID > ? "
. "ORDER BY Parent_ID, Site_Id ;", $parent_id);
while ($tuple = SQLFetchTuple($cursor))
if (in_array($tuple['Parent_ID'], $nodes))
$nodes[] = $tuple['Site_Id'];
SQLFree($cursor);
// The first node is the global parent, and may be array_shift'ed away
// if desired.
Another way
quite brute force
Another possibility is to store the "descendant_of" relation recursively in another table:
TRUNCATE descendants;
INSERT INTO descendants ( node, of ) VALUES ( -1, NULL );
INSERT INTO descendants SELECT SiteId, ParentId FROM site JOIN
descendants ON ( site.ParentId = descendants.of );
And repeat the INSERTs until number of inserted rows is equal to zero (or total number of rows in descendants stops increasing; querying table size is very fast in most DBs).
At this point you will have stored all one-level relations. Now:
INSERT IGNORE INTO descendants SELECT s1.node, s2.of FROM
descendants AS s1 JOIN descendants AS s2 ON (s1.of = s2.node);
...again until descendants stops increasing (it will require a number of inserts equal to the maximum number of levels). Total number of JOINs will be twice the number of levels.
Now if you want to get all descendants of node 16, you simply query
SELECT node FROM descendants WHERE of = 16;
Upvotes: 2
Reputation: 58454
To begin with, i would recommend a bit different method for storing a tree: Closure Table. If you want to know more about it, you could find SQL Antipatterns book quite interesting.
That said. The easiest way, in my opinion, to generate such structure is this: http://jsbin.com/omexix/3/edit#javascript
I hope you have no problems reading JavaScript code. I used it because creation of unclassified objects in JavaScript doesn't look so hack-ish. It is possible to implement same without relaying on object (or references) by using multidimensional array, but it kinda looks confusing.
Here is what algorithm does:
This is about it. Basically you generate two lists: with all the nodes, and with only root node.
Upvotes: 3
Reputation: 2852
If you do not update your site
table often you can use the following strategy:
create table site
(
site_Id int(5),
parent_Id int(5),
site_desc varchar2(100),
parents_path varchar(X)
);
parents_path
equals path to selected node from root. For example, for leaf J
it should be |A|B|D|
.
Pros: - you will need single query to get the result;
Cons: - more query during updates (but you can do updates wisely);
Hope it helps
Upvotes: 2
Reputation: 209965
Check out the nested set model, if you want to be able to do this in single queries: http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/
Another alternative is to include all relationships in a linking table. So every site would have a link to its parent, its grandparent, etc. Every relationship is explicit. Then you just query that linkage table to get all descendants.
Upvotes: 5