Zach Russell
Zach Russell

Reputation: 1130

Efficient SQL query to delete duplicated rows

So I have a medium sized SQLite with ~10.3 million rows. I have some duplicated rows that I want to remove:

The column names are:

  1. Keyword
  2. Rank
  3. URL

The duplication I want to remove would be where the keyword and rank are both the same, but, the URL could be different. So I would only want the first instance of the keyword/rank pair to remain in the database and remove all subsequent matching rows.

What is the most efficient way to go through the entire DB and do this for all the rows?

Upvotes: 0

Views: 64

Answers (2)

Utsav
Utsav

Reputation: 8093

When you say So I would only want the first instance of the keyword/rank pair to remain in the database and remove all subsequent matching rows., you can never guarantee that. The reason is that your table dont have a unique key (like id or create_date). So there is no guarantee that the row which was entered first will be returned first if you select it again. So keeping this part aside, you can do something like this which will give you first instance most of the time.

delete from tbl 
where 
rowid not in
(
select  min(rowid) 
from tbl
group by Keyword,Rank
)

See sqlfiddle example here

Upvotes: 1

Roberto
Roberto

Reputation: 195

You can try something like this:

sqlite> create table my_example (keyword, rank, url);
sqlite> insert into my_example values ('aaaa', 2, 'wwww...');
sqlite> insert into my_example values ('aaaa', 2, 'wwww2..');
sqlite> insert into my_example values ('aaaa', 3, 'www2..');
sqlite> DELETE FROM my_example
   ...> WHERE rowid not in
   ...> (SELECT MIN(rowid)
   ...> FROM my_example
   ...> GROUP BY keyword, rank);
sqlite> select * from my_example;
keyword     rank        url
----------  ----------  ----------
aaaa        2           wwww...
aaaa        3           www2..
sqlite>

Upvotes: 3

Related Questions