Reputation: 11
Suppose we have a directed graph defined as following:
node | neighbor
-----------------
1 | 2
1 | 3
2 | 4
2 | 3
3 | 4
the above table defines the only the edges between two nodes, a couple (1,2)
for example means that node 1
and 2
are connected by an edge, here is a plot of the graph.
I also have a table of the transitive closure of the graph, this table holds all the possible paths of the graph (for example: (1,3)
is present twice because it can be reached either directly or by the path 1=>2=>3
), here is the table:
node | neighbor
-----------------
1 | 2
1 | 3
2 | 4
2 | 3
3 | 4
1 | 3
1 | 4
1 | 4
2 | 4
from these two tables, I want to return a minimized graph without losing any reachability, an idea was to only return edges that are not in dependency of the two tables, here's an example:
(1,2)
is in the first table and (2,3)
is in the second, and therefore (1,3)
can be deleted from the first table because you can reach node 3
from 1
passing by node 2
the outuput table should look like this then:
node | neighbor
-----------------
1 | 2
2 | 3
3 | 4
How can I write an SQL query that does this?
Upvotes: 0
Views: 46
Reputation: 222462
Here is one approach:
with recursive cte as (
select node, neighbor, 1 is_initial from graph
union all
select c.node, g.neighbor, 0
from cte c
inner join graph g on g.node = c.neighbor
)
select node, neighbor
from graph g
where not exists (
select 1
from cte c
where c.node = g.node and c.neighbor = g.neighbor and c.is_initial = 0
)
order by node, neighbor
This uses the first table only (I called it graph
). We start by generating all possible paths with a recursive query. This is quite similar to your closure table, but with one extra column, is_initial
, that indicates whether the path comes from the original table, or was generated during a further iteration.
Then, all that is left to do is filter the graph to remove tuples that match a "non-initial" path.
node | neighbor ---: | -------: 1 | 2 2 | 3 3 | 4
Upvotes: 0