Reputation: 6333
I have a table called connections
that shows mirrored user connections.
For example:
The resulting table looks like this:
user_id | connected_to |
---|---|
1 | 2 |
2 | 1 |
2 | 3 |
3 | 2 |
4 | 6 |
6 | 4 |
5 | 6 |
6 | 5 |
I want to form sub-networks by joining all users that have other users in common and assign each of them the smallest user_id
.
The result I'm looking for is:
user_id | subnetwork_id |
---|---|
1 | 1 |
2 | 1 |
3 | 1 |
4 | 4 |
5 | 4 |
6 | 4 |
The logic is as follows:
user_id
1 is the smallest of all three, so subnetwork_id
should equal 1.user_id
4 is the smallest of all three, so subnetwork_id
should be 4.I wrote a recursive query (will post in future edit) that self-joins the connections
table with itself on user_id
and connected_to
.
My query does not work with the second example (sub-network 4, 5 and 6) because the greatest user_id
is the one that connects them all together.
How can I go from table connections
to the desired result?
Upvotes: 0
Views: 73
Reputation: 2479
Here is a solution using a recursive CTE.
The idea is to build a network for each node in a subquery while aggregating the user_id
values of connected nodes in an array (network
), which is used to stop recursion since the original connections are mirrored. Then the minimum user_id
value from all interconnected nodes is used as a network id.
WITH users AS (
SELECT DISTINCT user_id FROM connections
)
SELECT
u.user_id,
(
WITH RECURSIVE cte AS (
SELECT
c.user_id,
ARRAY[c.user_id] AS network
FROM connections c
WHERE c.user_id = u.user_id
UNION ALL
SELECT
c.user_id,
ARRAY[c.user_id] || cte.network AS network
FROM connections c
JOIN cte ON c.connected_to = cte.user_id
AND NOT (ARRAY[c.user_id] && cte.network)
)
SELECT MIN(user_id) FROM cte
) AS subnetwork_id
FROM users u
ORDER BY user_id, subnetwork_id
Query output
user_id | subnetwork_id |
---|---|
1 | 1 |
2 | 1 |
3 | 1 |
4 | 4 |
5 | 4 |
6 | 4 |
You can check a working demo here
Upvotes: 1
Reputation: 713
I was using your first version (without mirroring), and assuming the following query should return all the connections in a single tree (assuming no loops):
SELECT LEAST(min(user_id), min(connected_to)) FROM (WITH RECURSIVE connect_rec AS (
SELECT
user_id,
connected_to
FROM
connections
WHERE
connected_to = 1 OR user_id = 1
UNION
SELECT
cc.user_id,
cc.connected_to
FROM
connections cc
INNER JOIN connect_rec rec ON (cc.connected_to = rec.user_id or cc.user_id = rec.connected_to or cc.user_id = rec.user_id or cc.connected_to = rec.connected_to)
) SELECT
*
FROM
connect_rec) full_connections
you can do a for loop for any given range (i'm just using 1..10 here, you can select until max of all values):
do $$
declare f record;
begin
for counter in 1..10 loop
for f in (SELECT LEAST(min(user_id), min(connected_to)) as min_tree FROM (WITH RECURSIVE connect_rec AS (
SELECT user_id, connected_to
FROM connections
WHERE connected_to = counter OR user_id = counter
UNION
SELECT cc.user_id, cc.connected_to
FROM connections cc
INNER JOIN connect_rec rec ON (cc.connected_to = rec.user_id or cc.user_id = rec.connected_to or cc.user_id = rec.user_id or cc.connected_to = rec.connected_to))
SELECT * FROM connect_rec) full_connections)
loop
raise notice '% - min %', counter, f.min_tree;
end loop;
end loop;
end; $$
the result is:
> notice: 1 - min 1
> notice: 2 - min 1
> notice: 3 - min 1
> notice: 4 - min 4
> notice: 5 - min 4
> notice: 6 - min 4
> notice: 7 - min <NULL>
> notice: 8 - min <NULL>
> notice: 9 - min <NULL>
> notice: 10 - min <NULL>
> OK
> Time: 0,004s
Upvotes: 1