Reputation: 341
I have a database table with three columns. Id, user_id, book_id. In this table, there are some duplicates. a user_id should only have one record of a book_id, but in some cases, a user_id has several book_ids. There are a couple of million records already and I'm wondering how to remove any duplicates.
Upvotes: 0
Views: 726
Reputation: 428
Hope this query will allow you to remove duplicates:
DELETE bl1 FROM book_log bl1
JOIN book_log bl2
ON (
bl1.id > bl2.id AND
bl1.user_id = bl2.user_id AND
bl1.book_id = bl2.book_id
);
Upvotes: 0
Reputation: 1887
Try following.
SQL SERVER
WITH ORDERED AS
(
SELECT id
ROW_NUMBER() OVER (PARTITION BY [user_id] , [book_id] ORDER BY id ASC) AS rn
FROM
tableName
)
delete from tableName
where id in ( select id from ORDERED where rn != 1)
MYSQL
delete from tableName
where id not in(
select MIN(id)from tableName
group by user_id, book_id
)
Edited as per comments - In MySQL, you can't modify the same table which you use in the SELECT part
This will solve the issue.
delete from tableName
where id not in(
select temp.temp_id from (
select MIN(id) as temp_id from tableName
group by user_id, book_id
) as temp
)
This will keep only one combination of (user_id, book_id)
Upvotes: 1
Reputation: 263723
If you execute this statement below, it will delete all duplicate records of user_ID
and leaving only the greatest ID
for each user_ID
DELETE a
FROM tableName a
LEFT JOIN
(
SELECT user_ID, MAX(ID) max_ID
FROM tableName
GROUP BY user_ID
) b ON a.user_ID = b.user_ID AND
a.ID = b.max_ID
WHERE b.max_ID IS NULL
Upvotes: 0