Efficient SQL query to delete duplicated rows

Question

So I have a medium sized SQLite with ~10.3 million rows. I have some duplicated rows that I want to remove:

The column names are:

Keyword
Rank
URL

The duplication I want to remove would be where the keyword and rank are both the same, but, the URL could be different. So I would only want the first instance of the keyword/rank pair to remain in the database and remove all subsequent matching rows.

What is the most efficient way to go through the entire DB and do this for all the rows?

Roberto · Accepted Answer

You can try something like this:

sqlite> create table my_example (keyword, rank, url);
sqlite> insert into my_example values ('aaaa', 2, 'wwww...');
sqlite> insert into my_example values ('aaaa', 2, 'wwww2..');
sqlite> insert into my_example values ('aaaa', 3, 'www2..');
sqlite> DELETE FROM my_example
   ...> WHERE rowid not in
   ...> (SELECT MIN(rowid)
   ...> FROM my_example
   ...> GROUP BY keyword, rank);
sqlite> select * from my_example;
keyword     rank        url
----------  ----------  ----------
aaaa        2           wwww...
aaaa        3           www2..
sqlite>

Efficient SQL query to delete duplicated rows

Answers (2)

Related Questions