Reputation: 1022
I have a table in CockroachDB, I have populated the data into the table before applying the constraints to set the primary key, and because some of INSERT statement failed through the data-loading phase, some of the rows are loaded into the table more than one time by mistake.
The constraint I want to apply is:
CREATE UNIQUE INDEX ON "MyDB"."MyTable" ("Row_ID");
But as the duplicate data is already loaded into the table, I get the following error:
pq: multiple primary keys for table "MyTable" are not allowed
I have check to see if actually there are some duplicated rows with the following query:
SELECT
Row_ID,
COUNT(Row_ID) as id
FROM
MyTable
GROUP BY
Row_ID
HAVING
COUNT(Row_ID) > 1;
And the query showed there are some duplicate rows.
What is the best way to remove the duplicate rows in CockroachDB?
Upvotes: 0
Views: 616
Reputation: 393
If you dont care about the order:
DELETE FROM MyTable WHERE Row_ID NOT IN (
SELECT MIN(Row_ID) FROM MyTable GROUP BY Row_ID);
Upvotes: 0
Reputation: 5589
If they are an exact match you can create a new table with the distinct records.
Then delete all the old table and repopulate it from the table created in last step.
To create the table:
create table MyWorkData as select distinct * from MyTable;
To delete MyTable
delete from MyTable;
To repopulate MyTable. (Thankyou @benesch for correcting the statement)
INSERT INTO MyTable SELECT * FROM MyWorkData
Finally delte the working table.
Upvotes: 2
Reputation: 5269
If you don't care about which duplicated row you keep, run:
DELETE FROM MyTable WHERE rowid IN (
SELECT MIN(rowid) FROM MyTable GROUP BY Row_ID HAVING COUNT(*) > 1
);
For any duplicates, that query will delete all but the row that was roughly created first.†
Note that rowid
is not the same as your Row_ID
column. rowid
is an internal CockroachDB column that is magically created for tables with no primary key, and is guaranteed to be unique for each row in the table.
† rowid
stores <timestamp><node-id>
, so it sorts roughly according to insertion time, but rows inserted near-simultaneously by nodes with skewed clocks may not sort in insertion order.
Upvotes: 3