Reputation:

Find Ultimate Parent

I'm new to SQL and ran into this problem, the idea is to flatten the hierarchy. This is the input table, where each child has a parent. The objective is to find the ultimate parent. Input:

Parent | Child
-------|------
A      | B
B      | C
C      | D
E      | F
F      | G

Output

Parent | Child | Ultimate Parent
-------|------ |----------------
A      | B     | A
B      | C     | A
C      | D     | A
E      | F     | E
F      | G     | E

I can't seem to find a way to do this using SQL. Any help appreciated.

Upvotes: 1

Answers (5)

Byron

Reputation: 339

If you have control over the input, then you can use a "Modified Preorder Tree Traversal" and 2 numeric columns to represent the scope of each node of the tree. This means no recursion, and not having to know the depth of the tree in advance.

Upvotes: 0

Farshid Shekari

Reputation: 2449

You can use recursive query like below and this implementation will return every child's root parent and it doesn't matter how many root parent exists:

with RECURSIVE tree as(
        select t1.*, t1.child as Ultimate_Parent
        from table_name t1
        where parent is null
        union all
        select t1.*, t.Ultimate_Parent
        from table_name t1
        join tree t on t.child = t1.parent
)
select * from tree;

and I supposed your data is like below:

Parent	Child
Null	A
A	B
B	C
C	D
Null	E
E	F
F	G

and also below image is implemented in the SQL server:

Upvotes: 0

F. Müller

Reputation: 4062

Since OP asked for a Procedure-Less-Solution for multiple root-nodes after resolving the ambiguous question, the part right below was added. It represents an experimental addition to the previous answer, that was based on the "one root idea", see below.

Semi-Automatic N-Roots solution attempt without the use of Procedures

Why Semi-Automatic?
Because you have to know the level of nesting beforehand and you would also have to adjust the numbers of helper-tables that you want to use including altering the final 'absparent' table update logic.

How does it work?
The idea behind this approach is that we first identify the root nodes and save them to a table (t1).

Next, we want to find the nodes that sit one layer below the root (t2) - we already store the children along in (t1). So we fill the (t2) table with the information provided from (t1) and the main table (absparent).

We then repeat this for all n-layers that should be supported.

In the update-logic, we connect the helper-tables and find the (t1) root node i.e. the ultimate parent.

I used https://sqliteonline.com for testing.

The output would look like this:

Table t1:

id	parent	child
1	A	B
4	E	F

Table t2:

id	parent	child
2	B	C
5	F	G

Table t3:

id	parent	child
3	C	D

Table 'absparent':

parent	child	ultimate_parent
A	B	A
B	C	A
C	D	A
E	F	E
F	G	E

-- 
-- NODE FLATTENING, N-LEVELS
-- 

-- delete old data
DROP TABLE IF EXISTS `absparent`;
DROP TABLE IF EXISTS `t1`;
DROP TABLE IF EXISTS `t2`;
DROP TABLE IF EXISTS `t3`;

-- create main table
CREATE TABLE `absparent` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `parent` varchar(3) DEFAULT NULL,
  `child` varchar(3) DEFAULT NULL,
  `ultimate_parent` varchar(3) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

INSERT INTO `absparent` (parent, child)
VALUES
  ("A", "B"),
  ("B", "C"),
  ("C", "D"),
  ("E", "F"),
  ("F", "G");


-- create temp helper tables
CREATE TABLE `t1` (id INT(11), parent VARCHAR(3), child VARCHAR(3));
INSERT INTO `t1`
  SELECT id, parent, child
  FROM `absparent`
  WHERE parent
  NOT IN (SELECT DISTINCT child FROM `absparent`);

CREATE TABLE `t2` (id INT(11), parent VARCHAR(3), child VARCHAR(3));
INSERT INTO `t2`
  SELECT a.id, a.parent, a.child FROM `absparent` a, `t1` WHERE a.parent = t1.child;

CREATE TABLE `t3` (id INT(11), parent VARCHAR(3), child VARCHAR(3));
INSERT INTO `t3`
  SELECT a.id, a.parent, a.child FROM `absparent` a, `t2` WHERE a.parent = t2.child;


-- update absparent table with the temp helper tables
UPDATE absparent a, t1, t2, t3
SET a.ultimate_parent = t1.parent
WHERE
  (a.parent = t1.parent) OR
  (a.parent = t2.parent AND t1.child = t2.parent) OR
  (a.parent = t3.parent AND t1.child = t2.parent AND t2.child = t3.parent);

-- final output
SELECT * FROM absparent;

-- remove temp tables
DROP TABLE `t1`;
DROP TABLE `t2`;
DROP TABLE `t3`;

One root only

Under the premise that E is connected to the rest of the nodes and you only have one ultimate parent (has no parent node!) in the entire table, you could probably do something like this:

Create the DB:

CREATE TABLE `absparent` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `parent` varchar(3) DEFAULT NULL,
  `child` varchar(3) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Insert the records:

INSERT INTO absparent (parent, child)
VALUES
("A", "B"),
("B", "C"),
("C", "D"),
("E", "F"),
("F", "G"),
("C", "E")

Run the query with a subquery:
The ultimate parent has to be a node that has no parent node i.e. is never a child of another node.

SELECT
parent,
child,
(
  SELECT parent FROM absparent
  WHERE parent
  NOT IN (SELECT DISTINCT child FROM absparent)
) AS ultimate_parent
FROM absparent

Output:

parent	child	ultimate_parent
A	B	A
B	C	A
C	D	A
E	F	A
F	G	A
C	E	A

Upvotes: 1

Harunur Rashid

Reputation: 29

Managing Hierarchical Data Using the Adjacency List Mode.

In this hierarchical data using the adjacency list mode will be a solution to this problem. This is a multiple trees structured table data. There will be many more nodes of a parent id.

Here is a demonstration for similar problem

CREATE TABLE category (
  id int(10) unsigned NOT NULL AUTO_INCREMENT,
  title varchar(255) NOT NULL,
  parent_id int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (id),
  FOREIGN KEY (parent_id) REFERENCES category (id) 
    ON DELETE CASCADE ON UPDATE CASCADE
);

INSERT INTO category(title,parent_id) 
VALUES('Electronics',NULL);
INSERT INTO category(title,parent_id) 
VALUES('Laptops & PC',1);

INSERT INTO category(title,parent_id) 
VALUES('Laptops',2);
INSERT INTO category(title,parent_id) 
VALUES('PC',2);

INSERT INTO category(title,parent_id) 
VALUES('Cameras & photo',1);
INSERT INTO category(title,parent_id) 
VALUES('Camera',5);

INSERT INTO category(title,parent_id) 
VALUES('Phones & Accessories',1);
INSERT INTO category(title,parent_id) 
VALUES('Smartphones',7);

INSERT INTO category(title,parent_id) 
VALUES('Android',8);
INSERT INTO category(title,parent_id) 
VALUES('iOS',8);
INSERT INTO category(title,parent_id) 
VALUES('Other Smartphones',8);

INSERT INTO category(title,parent_id) 
VALUES('Batteries',7);
INSERT INTO category(title,parent_id) 
VALUES('Headsets',7);
INSERT INTO category(title,parent_id) 
VALUES('Screen Protectors',7);

The following recursive common table expression (CTE) retrieves the whole category tree. Notice that the CTE feature has been available since MySQL 8.0

WITH RECURSIVE category_path (id, title, path) AS
(
  SELECT id, title, title as path
    FROM category
    WHERE parent_id IS NULL
  UNION ALL
  SELECT c.id, c.title, CONCAT(cp.path, ' > ', c.title)
    FROM category_path AS cp JOIN category AS c
      ON cp.id = c.parent_id
)
SELECT * FROM category_path
ORDER BY path;

The output will be like that

This is a sample demonstration. You can try for yours this way. Hope it will help you to solve your problem

Upvotes: 0

Rick James

Reputation: 142298

The [original] question and the example data show only a single DAG. There is a shortcut for finding its single root:

SELECT DISTINCT lower.parent AS the_root
    FROM Tbl AS lower
    LEFT JOIN Tbl AS upper  WHERE upper.child = lower.parent
    WHERE upper.parent IS NULL;

If you know that there is only one root, there is no need to add another column to the table.

If you can have multiple trees in the table, that lists the roots, but does not lend itself to adding the column.

Multiple roots and extra column

I suggest writing a stored procedure so you can have a loop (or use app code to do the looping).

Initialize the new column (ultimate_parent) with NULLs

Set the roots:

 UPDATE tbl AS a
     SET a.ultimate_parent = a.parent
     WHERE NOT EXISTS ( SELECT 1
            FROM tbl
            WHERE child = a.parent );

While this has "rows_affected > 0" do

 UPDATE tbl AS a
   JOIN tbl AS b ON b.parent = a.child
     SET b.ultimate_parent = a.ultimate_parent
     WHERE b.ultimate_parent IS NULL
       AND a.ultimate_parent IS NOT NULL;

That will make one pass over the data for each level in the tree(s) are.

For a million rows, and/or a hundred levels deep, this would take a non-trivial amount of time. But it works for MySQL 4.1 or newer. (Before 4.1, you would need to do the looping in your app.)

Indexes

PRIMARY KEY(parent, child),
INDEX(child, parent)

(There seems to be no need for an auto_increment id.)

Upvotes: 2

Find Ultimate Parent

Answers (5)

Semi-Automatic N-Roots solution attempt without the use of Procedures

One root only

Related Questions