user25589
user25589

Reputation: 11

Minimizing a graph with SQL

Suppose we have a directed graph defined as following:

 node | neighbor
 -----------------
   1  | 2
   1  | 3
   2  | 4
   2  | 3
   3  | 4

the above table defines the only the edges between two nodes, a couple (1,2)for example means that node 1 and 2 are connected by an edge, here is a plot of the graph.

I also have a table of the transitive closure of the graph, this table holds all the possible paths of the graph (for example: (1,3) is present twice because it can be reached either directly or by the path 1=>2=>3), here is the table:

 node | neighbor
 -----------------
   1  | 2
   1  | 3
   2  | 4
   2  | 3
   3  | 4
   1  | 3
   1  | 4
   1  | 4
   2  | 4

from these two tables, I want to return a minimized graph without losing any reachability, an idea was to only return edges that are not in dependency of the two tables, here's an example:
(1,2) is in the first table and (2,3) is in the second, and therefore (1,3) can be deleted from the first table because you can reach node 3 from 1 passing by node 2
the outuput table should look like this then:

 node | neighbor
 -----------------
   1  | 2
   2  | 3
   3  | 4
  

How can I write an SQL query that does this?

Upvotes: 0

Views: 46

Answers (1)

GMB
GMB

Reputation: 222462

Here is one approach:

with recursive cte as (
    select node, neighbor, 1 is_initial from graph
    union all 
    select c.node, g.neighbor, 0
    from cte c
    inner join graph g on g.node = c.neighbor
)
select node, neighbor
from graph g
where not exists (
        select 1 
        from cte c
        where c.node = g.node and c.neighbor = g.neighbor and c.is_initial = 0
    )
order by node, neighbor

This uses the first table only (I called it graph). We start by generating all possible paths with a recursive query. This is quite similar to your closure table, but with one extra column, is_initial, that indicates whether the path comes from the original table, or was generated during a further iteration.

Then, all that is left to do is filter the graph to remove tuples that match a "non-initial" path.

Demo on DB Fiddle:

node | neighbor
---: | -------:
   1 |        2
   2 |        3
   3 |        4

Upvotes: 0

Related Questions