Aldibe
Aldibe

Reputation: 1347

Creating optimal graph relation in PostgreSQL

I have a table in PostgreSQL which consists of 2 columns, first ID and second ID. Every entry in it means that there's a relation between first and second ID, and it can be guaranteed that first ID will always be larger than the second ID.

My goal is to process the table so that it can detect networks (multiple IDs that relates to one another), and change every relations of that networks in the table so that first ID is the large IDs in the network and the second ID is always the smallest ID in the network.
Example:

D->C , C->B , B->A , F->E , H->G

Will become:

D->A , C->A , B->A , F->E , H->G

Another example:

D->C , D->B , D->A

Will become:

D->A , C->A , B->A

How to do this using SQL or a Postgres procedural language?

Edit : The PostgreSQL version that i'm using is 9.4. The table consists of column id1 (integer) and id2 (integer), with both of them as primary key.

As to how to conclude that A is the smallest in the set of second example (A,B,C), I used this query to determine the smallest id2

SELECT id1, MIN(id2) FROM table GROUP BY id1

Upvotes: 1

Views: 794

Answers (1)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656714

Assuming a table like this:

CREATE TABLE tbl (
  id1 int
, id2 int
, PRIMARY KEY (id1, id2)
);

Circular references are not possible according to your logic.

This recursive CTE would do the job:

WITH RECURSIVE cte AS (
   SELECT t1.id1, t1.id2, t2.id2 AS next_id2
   FROM   tbl t1
   LEFT   JOIN tbl t2 ON t2.id1 = t1.id2

   UNION ALL
   SELECT t1.id1, t1.next_id2, t2.id2
   FROM   cte t1
   LEFT   JOIN tbl t2 ON t2.id1 = t1.next_id2
   WHERE  t1.next_id2 IS NOT NULL  -- stop iterating at end of graph
   )
SELECT id1, id2
FROM   cte
WHERE  next_id2 IS NULL;  -- only rows where the graph ends

In the non-recursive term, select every row and try to find the next step with a LEFT JOIN.

In the recursive term replace the second ID with the next step until we reach the end of the graph (next_id2 IS NULL).

In the outer SELECT only return rows where the graph ends. Result in arbitrary sort order.

It's unclear how you determine the deepest rabbit hole if graphs can fork.

Upvotes: 1

Related Questions