Reputation:
I'm new to SQL and ran into this problem, the idea is to flatten the hierarchy. This is the input table, where each child has a parent. The objective is to find the ultimate parent. Input:
Parent | Child
-------|------
A | B
B | C
C | D
E | F
F | G
Output
Parent | Child | Ultimate Parent
-------|------ |----------------
A | B | A
B | C | A
C | D | A
E | F | E
F | G | E
I can't seem to find a way to do this using SQL. Any help appreciated.
Upvotes: 1
Views: 1270
Reputation: 339
If you have control over the input, then you can use a "Modified Preorder Tree Traversal" and 2 numeric columns to represent the scope of each node of the tree. This means no recursion, and not having to know the depth of the tree in advance.
Upvotes: 0
Reputation: 2449
You can use recursive query like below and this implementation will return every child's root parent and it doesn't matter how many root parent exists:
with RECURSIVE tree as(
select t1.*, t1.child as Ultimate_Parent
from table_name t1
where parent is null
union all
select t1.*, t.Ultimate_Parent
from table_name t1
join tree t on t.child = t1.parent
)
select * from tree;
and I supposed your data is like below:
Parent | Child |
---|---|
Null | A |
A | B |
B | C |
C | D |
Null | E |
E | F |
F | G |
and also below image is implemented in the SQL server:
Upvotes: 0
Reputation: 4062
Since OP asked for a Procedure-Less-Solution for multiple root-nodes after resolving the ambiguous question, the part right below was added. It represents an experimental addition to the previous answer, that was based on the "one root idea", see below.
Why Semi-Automatic?
Because you have to know the level of nesting beforehand and you would also have to adjust the numbers of helper-tables that you want to use including altering the final 'absparent' table update logic.
How does it work?
The idea behind this approach is that we first identify the root nodes and save them to a table (t1).
Next, we want to find the nodes that sit one layer below the root (t2) - we already store the children along in (t1). So we fill the (t2) table with the information provided from (t1) and the main table (absparent).
We then repeat this for all n-layers that should be supported.
In the update-logic, we connect the helper-tables and find the (t1) root node i.e. the ultimate parent.
I used https://sqliteonline.com for testing.
The output would look like this:
Table t1:
id | parent | child |
---|---|---|
1 | A | B |
4 | E | F |
Table t2:
id | parent | child |
---|---|---|
2 | B | C |
5 | F | G |
Table t3:
id | parent | child |
---|---|---|
3 | C | D |
Table 'absparent':
parent | child | ultimate_parent |
---|---|---|
A | B | A |
B | C | A |
C | D | A |
E | F | E |
F | G | E |
--
-- NODE FLATTENING, N-LEVELS
--
-- delete old data
DROP TABLE IF EXISTS `absparent`;
DROP TABLE IF EXISTS `t1`;
DROP TABLE IF EXISTS `t2`;
DROP TABLE IF EXISTS `t3`;
-- create main table
CREATE TABLE `absparent` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`parent` varchar(3) DEFAULT NULL,
`child` varchar(3) DEFAULT NULL,
`ultimate_parent` varchar(3) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `absparent` (parent, child)
VALUES
("A", "B"),
("B", "C"),
("C", "D"),
("E", "F"),
("F", "G");
-- create temp helper tables
CREATE TABLE `t1` (id INT(11), parent VARCHAR(3), child VARCHAR(3));
INSERT INTO `t1`
SELECT id, parent, child
FROM `absparent`
WHERE parent
NOT IN (SELECT DISTINCT child FROM `absparent`);
CREATE TABLE `t2` (id INT(11), parent VARCHAR(3), child VARCHAR(3));
INSERT INTO `t2`
SELECT a.id, a.parent, a.child FROM `absparent` a, `t1` WHERE a.parent = t1.child;
CREATE TABLE `t3` (id INT(11), parent VARCHAR(3), child VARCHAR(3));
INSERT INTO `t3`
SELECT a.id, a.parent, a.child FROM `absparent` a, `t2` WHERE a.parent = t2.child;
-- update absparent table with the temp helper tables
UPDATE absparent a, t1, t2, t3
SET a.ultimate_parent = t1.parent
WHERE
(a.parent = t1.parent) OR
(a.parent = t2.parent AND t1.child = t2.parent) OR
(a.parent = t3.parent AND t1.child = t2.parent AND t2.child = t3.parent);
-- final output
SELECT * FROM absparent;
-- remove temp tables
DROP TABLE `t1`;
DROP TABLE `t2`;
DROP TABLE `t3`;
Under the premise that E is connected to the rest of the nodes and you only have one ultimate parent (has no parent node!) in the entire table, you could probably do something like this:
Create the DB:
CREATE TABLE `absparent` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`parent` varchar(3) DEFAULT NULL,
`child` varchar(3) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Insert the records:
INSERT INTO absparent (parent, child)
VALUES
("A", "B"),
("B", "C"),
("C", "D"),
("E", "F"),
("F", "G"),
("C", "E")
Run the query with a subquery:
The ultimate parent has to be a node that has no parent node i.e. is never a child of another node.
SELECT
parent,
child,
(
SELECT parent FROM absparent
WHERE parent
NOT IN (SELECT DISTINCT child FROM absparent)
) AS ultimate_parent
FROM absparent
Output:
parent | child | ultimate_parent |
---|---|---|
A | B | A |
B | C | A |
C | D | A |
E | F | A |
F | G | A |
C | E | A |
Upvotes: 1
Reputation: 29
Managing Hierarchical Data Using the Adjacency List Mode.
In this hierarchical data using the adjacency list mode will be a solution to this problem. This is a multiple trees structured table data. There will be many more nodes of a parent id.
Here is a demonstration for similar problem
CREATE TABLE category (
id int(10) unsigned NOT NULL AUTO_INCREMENT,
title varchar(255) NOT NULL,
parent_id int(10) unsigned DEFAULT NULL,
PRIMARY KEY (id),
FOREIGN KEY (parent_id) REFERENCES category (id)
ON DELETE CASCADE ON UPDATE CASCADE
);
INSERT INTO category(title,parent_id)
VALUES('Electronics',NULL);
INSERT INTO category(title,parent_id)
VALUES('Laptops & PC',1);
INSERT INTO category(title,parent_id)
VALUES('Laptops',2);
INSERT INTO category(title,parent_id)
VALUES('PC',2);
INSERT INTO category(title,parent_id)
VALUES('Cameras & photo',1);
INSERT INTO category(title,parent_id)
VALUES('Camera',5);
INSERT INTO category(title,parent_id)
VALUES('Phones & Accessories',1);
INSERT INTO category(title,parent_id)
VALUES('Smartphones',7);
INSERT INTO category(title,parent_id)
VALUES('Android',8);
INSERT INTO category(title,parent_id)
VALUES('iOS',8);
INSERT INTO category(title,parent_id)
VALUES('Other Smartphones',8);
INSERT INTO category(title,parent_id)
VALUES('Batteries',7);
INSERT INTO category(title,parent_id)
VALUES('Headsets',7);
INSERT INTO category(title,parent_id)
VALUES('Screen Protectors',7);
The following recursive common table expression (CTE) retrieves the whole category tree. Notice that the CTE feature has been available since MySQL 8.0
WITH RECURSIVE category_path (id, title, path) AS
(
SELECT id, title, title as path
FROM category
WHERE parent_id IS NULL
UNION ALL
SELECT c.id, c.title, CONCAT(cp.path, ' > ', c.title)
FROM category_path AS cp JOIN category AS c
ON cp.id = c.parent_id
)
SELECT * FROM category_path
ORDER BY path;
The output will be like that
This is a sample demonstration. You can try for yours this way. Hope it will help you to solve your problem
Upvotes: 0
Reputation: 142298
The [original] question and the example data show only a single DAG. There is a shortcut for finding its single root:
SELECT DISTINCT lower.parent AS the_root
FROM Tbl AS lower
LEFT JOIN Tbl AS upper WHERE upper.child = lower.parent
WHERE upper.parent IS NULL;
If you know that there is only one root, there is no need to add another column to the table.
If you can have multiple trees in the table, that lists the roots, but does not lend itself to adding the column.
Multiple roots and extra column
I suggest writing a stored procedure so you can have a loop (or use app code to do the looping).
Initialize the new column (ultimate_parent
) with NULLs
Set the roots:
UPDATE tbl AS a
SET a.ultimate_parent = a.parent
WHERE NOT EXISTS ( SELECT 1
FROM tbl
WHERE child = a.parent );
While this has "rows_affected > 0" do
UPDATE tbl AS a
JOIN tbl AS b ON b.parent = a.child
SET b.ultimate_parent = a.ultimate_parent
WHERE b.ultimate_parent IS NULL
AND a.ultimate_parent IS NOT NULL;
That will make one pass over the data for each level in the tree(s) are.
For a million rows, and/or a hundred levels deep, this would take a non-trivial amount of time. But it works for MySQL 4.1 or newer. (Before 4.1, you would need to do the looping in your app.)
Indexes
PRIMARY KEY(parent, child),
INDEX(child, parent)
(There seems to be no need for an auto_increment id.)
Upvotes: 2