Reputation: 640
I have to clean up records from a table that doesn't have a primary key or a a unique constraint.
Table definition:
create table person(
name text,
staff_id integer,
work_code text,
location
);
Unsurprisingly, it contains a lot of duplicates and partial duplicates. What is the best way to transform the records to a unique set. I don't have to care about other columns besides name and staff_id
Upvotes: 1
Views: 1996
Reputation: 656321
As you
don't have to care about other columns besides name and staff_id
This could be your procedure to clean up the table:
1.) Create a temporary table of unique rows:
CREATE TEMP TABLE p_tmp AS
SELECT DISTINCT ON (name, staff_id)
name, staff_id, work_code, location
FROM person
ORDER BY name, staff_id, work_code, location;
I arbitrarily pick the "first row per (name, staff_id)
- minimum work_code
and matching location
.
2.) Empty table:
TRUNCATE person;
3.) Re-INSERT unique tuples:
INSERT INTO person SELECT * FROM p_tmp;
Make sure, dupes don't creep back in. Add a surrogate primary key:
ALTER TABLE person ADD COLUMN person_id serial PRIMARY KEY;
ALTER TABLE person ADD UNIQUE (name, staff_id);
Or just add a multi-column primary key:
ALTER TABLE person ADD PRIMARY KEY (name, staff_id);
The temporary table will be dropped at the end of the session automatically.
Of course, all of this is best done inside one transaction, so you don't lose anything in the unlikely case that you run into a problem half way. Some clients do that automatically for a batch of SQL statements executed at once.
Upvotes: 4
Reputation: 121702
Maybe this?
select t.name, t.staff_id, t.work_code, t.location
from (
select name, staff_id, work_code, location, count(*) nr
from person
group by name, staff_id, work_code, location
) t
where t.nr > 1;
Upvotes: 0