sqldirected-acyclic-graphstransitive-closure-table

Reputation: 1689

Moving in Closure Table with Multiple Parents

I have the following DAG

A --> B

|     |
v     v

C --> D

Here is the closure table

| Ancestor | Descendant | Depth |
---------------------------------
| A        | A          | 0     |
| A        | B          | 1     |
| A        | C          | 1     |
| A        | D          | 2     |
| A        | D          | 2     |
| B        | B          | 0     |
| B        | D          | 1     |
| C        | C          | 0     |
| C        | D          | 1     |
| D        | D          | 0     |

How would I go about removing path B > D (thus removing A > B > D) without also removing A > C > D and C > D.

Right now I'm using the following query but it only works when every node only has 1 parent.

DELETE FROM `Closure`
WHERE `Descendant` IN (SELECT `Descendant` FROM `Closure` WHERE `Ancestor`=@Node)
AND `Ancestor` NOT IN (SELECT `Descendant` FROM `Closure` WHERE `Ancestor`=@Node);

Upvotes: 6

Answers (4)

Obscerno

Reputation: 193

While tracking depth and allowing multiple parents at the same time is probably possible, I get a code smell from it, especially when the solution involves having duplicates pairs. Thomas' answer outlines that issue.

I'm going to simplify the question a bit to just focus on unlinking a node when multiple parents are allowed, because it's a tricky enough problem on its own. I'm discarding the depth column entirely and assuming there are no duplicate pairs.

You have to take into account children of D, which gets complicated fast. Say we have:

A --> B
       
|     |
v     v

C --> D --> E

We want to unlink D from B, which means we also have to remove links between E and B. But what if they're connected like this:

A --> B --> C
             
      |     |
      v     v

      D --> E < -- G
      
      |
      V

H --> F

In this case if we disconnect B > D, we don't want to unlink B and E anymore, because E is still linked to B through C. But we do want to disconnect F from B.

I'll go through my solution below using that second example. I know that D only has one parent in this example, but it still works perfectly if D has multiple parents; I can more easily demonstrate some of the edge cases this way so that's why I'm doing it like this.

Here's what the table would look like:

| Ancestor | Descendant |
-------------------------
| A        | A          |
| A        | B          |
| A        | C          |
| A        | D          |
| A        | E          |
| A        | F          |
| B        | B          |
| B        | C          |
| B        | D          |
| B        | E          |
| B        | F          |
| C        | C          |
| C        | E          |
| D        | D          |
| D        | E          |
| D        | F          |
| E        | E          |
| F        | F          |
| G        | G          |
| G        | E          |
| H        | H          |
| H        | F          |

Query 1: Get all descendants of D, including D

SELECT `Descendant`
FROM `Closure`
WHERE `Ancestor` = @Node

This will return: D, E, F

Query 2: Get all ancestors of B, including B

SELECT `Ancestor`
FROM `Closure`
WHERE `Descendant` = @ParentNode

This will return: A, B

Query 3a: Get all ancestors of the items in Query 1 that do not appear in Query 1 or 2

SELECT DISTINCT `Ancestor`
FROM `Closure`
WHERE `Descendant` IN (@Query1)
AND `Ancestor` NOT IN (@Query1)
AND `Ancestor` NOT IN (@Query2)

This will return: C, G, H

The goal here is to get all parents of E and F that may reconnect farther up the chain.

Query 3b: this is exactly the same as Query 3a, except it returns both ancestors and descendants

SELECT DISTINCT `Ancestor`, `Descendant`
[ ... ]

This will return: (C, E), (G, E), (H, F)

We'll need this later.

Query 4: Filter results of Query 3a down to nodes that reconnect farther up the chain

SELECT `Ancestor`,`Descendant`
FROM `Closure`
WHERE `Descendant` IN (@Query3a)
AND (`Ancestor` IN (@Query2) OR `Ancestor` = `Descendant`))

This will return: (A, C), (B, C), (C, C)

We now have references to all parents of C that should not be unlinked. Note that we have no links to parents of F. That's because F is not connected to B and A except through D (which we're unlinking).

Query 5: Construct the pairs that should be kept, using the results of Query 3b as a bridge between Queries 1 and 4

SELECT `Query4`.`ancestor`,`Query3b`.`descendant`
FROM (@Query3b) as `Query3b`
LEFT JOIN (@Query4) as `Query4`
WHERE `Query3b`.`descendant` IN (@Query1)

This will return: (A, E), (B, E)

Query 6: Run the regular query for orphaning a node and its children, except exclude all pairs returned by Query 5

DELETE FROM `Closure`
WHERE `Descendant` IN (@Query1)
AND `Ancestor` IN (@Query2)
AND (`Ancestor`, `Descendant`) NOT IN (@Query5)

After this operation, we will have removed the following links:

| Ancestor | Descendant |
-------------------------
| A        | D          |
| A        | F          |
| B        | D          |
| B        | F          |

Both D and F are correctly unlinked, and E correctly retains its connections to A and B, leaving:

A --> B --> C
             
            |
            v

      D --> E < -- G
      
      |
      V

H --> F

Let me know if I've missed anything! I just solved this problem myself today and I may run into more edge cases as time goes on. If I find any I'll update this answer.

Upvotes: 0

Don

Reputation: 4147

Given your current model I'm not sure it's possible. I'd propose that you add a column to count the number of paths tracking how many different ways there are to get from any node X to node Y.

So rather than your table you end up with

| Ancestor | Descendant | Depth | Refs  |
-----------------------------------------
| A        | A          | 0     | 1     |
| A        | B          | 1     | 1     |
| A        | C          | 1     | 1     |
| A        | D          | 2     | 2     |
| B        | B          | 0     | 1     |
| B        | D          | 1     | 1     |
| C        | C          | 0     | 1     |
| C        | D          | 1     | 1     |
| D        | D          | 0     | 1     |

Removing a node would then entail an update statement followed by a delete statement. The update would, instead of deleting the entries it finds, decrement the number of references for that entry. Then you could delete the entries with 0 or fewer references afterwards.

Coming up with a SQL query which does the update is escaping me at the moment but in theory this should work without having to completely reconstruct the closure table...

Upvotes: 3

Thomas

Reputation: 64635

First, I believe there is a duplicate entry in your table. (A,D) appears twice. Second, after removing the edge (B,D), the following paths should remain:

Node self-maps: (A,A), (B,B), (C,C), (D,D)
(A,B)
(A,C)
(A,D) ( through C )

Thus, to remove the edge (B,D) in this example, all that is required is to remove that one row:

Delete MyTable 
Where Ancestor = 'B'
    And Descendant = 'D'

A closure table is still only mapping relations between two nodes. What makes it special is that it is mapping every indirect relation effectively as a direct relation. The edge (B,D) is simply saying that you can get from B to D. That row alone says nothing about how you got to B nor does it say anything about how many nodes it took to get from B to D; it simply saying you can get from B to D. Thus, there is no edge listed for A > B > D per se. Rather, all that is captured is that you can get from A to B and from A to D which is still true even if the edge (B,D) is removed.

Upvotes: 2

mgibsonbr

Reputation: 22007

In natural language, that would be: "Delete ancestor-descendant relashionship to D, if there is no parent of D besides B that is also a descendant of A". Is that correct?

(Edit: no, that's not correct; not only relashionships to D must be removed, but also relashionships to every descendant of D. Thus, that criteria is not valid...)

My tentative SQL would then be:

DELETE a
FROM Closure AS a
    INNER JOIN Closure AS b ON a.Descendant = b.Descendant
WHERE
    a.Descendant IN (SELECT Descendant FROM Closure WHERE Ancestor = {Child}) AND
    b.Depth = 1 AND
    b.Ancestor != {Parent} AND
    a.Ancestor NOT IN (SELECT Ancestor FROM Closure WHERE Descendant = b.Ancestor)

(Sorry if I got the query wrong - or used non-standard features - I'm not actually experienced with that. But my natural language description should give an insight for what actually needs to go on the query)

Update: On second thought, I don't believe my query will work for all cases. Consider this:

A --> B --> D --> E --> F

F is a descendant of D (True)
E is a parent of F (True)
E is not B (True)
A is not an ancestor of E (False)

Thus, A >> F won't be removed, even though it should. Sorry I couldn't help, but that seems a problem too big to fit in a single query. I'd suggest looking for an algorithmic solution first, then seeing how that could be implemented in your case.

Upvotes: 1

Moving in Closure Table with Multiple Parents

Answers (4)

Related Questions