Reputation: 103
I am creating a table of undirected graph like below.
+-------------------+------------------------+----------------------+
| id | node_a | node_b |
+-------------------+------------------------+----------------------+
| 1 | a | b |
+-------------------+------------------------+----------------------+
| 2 | a | c |
+-------------------+------------------------+----------------------+
| 3 | a | d |
+-------------------+------------------------+----------------------+
| 4 | b | a |
+-------------------+------------------------+----------------------+
| 5 | b | c |
+-------------------+------------------------+----------------------+
...
row id=1 and id=4 are duplicate rows and one shall be deleted. What would be an efficient way to remove all duplicate rows in this table?
Upvotes: 1
Views: 163
Reputation: 1269483
You can generate the distinct rows by doing:
select e.*
from edges e
where e.node_a < e.node_b
union all
select e.*
from edges e
where e.node_a > e.node_b and
not exists (select 1
from edges e2
where e2.node_a = e.node_b and e2.node_b = e.node_a
);
If you actually have duplicates that are not transposed, using union
instead of union all
.
The above preserves the original edges in the table. If that is not a concern, a simple method is:
select distinct least(node_a, node_b) as node_a, greatest(node_a, node_b) as node_b
from edges e;
Upvotes: 2