Reputation: 21
We've had an accident where multiple rows with duplicate values have been inserted into a table, and I need to find which rows in a rather specific format. So far, I have this query:
SELECT p2.id
FROM assignmentobject p1, assignmentobject p2
WHERE ST_Equals(p1.the_geom, p2.the_geom) AND
p1.id <> p2.id and p1.assignmentid = 15548
group by p1.id, p2.id
which compares the geometries of the rows and spits it out if it's the same. The IDs are primary keys, and are sequentially created.
However, this presents a problem, as this small segment of the result shows:
p1.id p2.id
35311 35314
35311 35315
35314 35311
35314 35315
35315 35311
35315 35314
As can be seen here, 35311, 35314, and 35315 have the same geometries, and because of this, all of the combinations between them are included in the result. What I'm aiming to achieve, is having the lowest or highest ID used as the "base", and ignore the other combinations that doesn't involve this "base". I.e., the result shown above would be:
p1.id p2.id
35311 35314
35311 35315
Here, the combinations between 31314 and 35315 are left out. Is this possible to achieve using pure SQL?
Upvotes: 2
Views: 529
Reputation: 4503
CREATE TABLE pair (
ll INTEGER NOT NULL
, rr INTEGER NOT NULL
, PRIMARY KEY (ll , rr)
) ;
INSERT INTO pair (ll,rr) VALUES
(35311,35314) ,(35311,35315)
,(35314,35311) ,(35314,35315)
,(35315,35311) ,(35315,35314)
;
SELECT p1.ll AS p1, p1.rr AS p2
FROM pair p1
WHERE p1.ll < p1.rr -- tie breaker
AND NOT EXISTS (
SELECT * FROM pair nx
WHERE nx.ll < nx.rr
AND nx.rr = p1.ll
)
;
The same with the original geo query packed into a CTE:
WITH pair AS (
SELECT p1.id AS ll
, p2.id AS rr
FROM assignmentobject p1
JOIN assignmentobject p2 ON ST_Equals(p1.the_geom, p2.the_geom)
-- not sure if you want this ...
AND p1.assignmentid = p2.assignmentid
WHERE p1.id <> p2.id and p1.assignmentid = 15548
-- group by seems to make no sense here
-- group by p1.id, p2.id
)
SELECT pp.ll AS p1, pp.rr AS p2
FROM pair pp
WHERE pp.ll < pp.rr -- tie breaker
AND NOT EXISTS (
SELECT * FROM pair nx
WHERE nx.ll < nx.rr
AND nx.rr = pp.ll
)
;
Upvotes: 1
Reputation: 125204
Just change the <>
operator to <
WHERE ST_Equals(p1.the_geom, p2.the_geom) AND
p1.id < p2.id and p1.assignmentid = 15548
If assignmentid is duplicated and you want all the duplicates at once
select p2.id
from
assignmentobject p1
inner join
assignmentobject p2 using(assigmentid)
where
st_equals(p1.the_geom, p2.the_geom) and
p1.id < p2.id
group by p1.id, p2.id
Upvotes: 1