William Stocks
William Stocks

Reputation: 341

Removing duplicate records from relational db table

I have a database table with three columns. Id, user_id, book_id. In this table, there are some duplicates. a user_id should only have one record of a book_id, but in some cases, a user_id has several book_ids. There are a couple of million records already and I'm wondering how to remove any duplicates.

Upvotes: 0

Views: 726

Answers (3)

Rostyslav
Rostyslav

Reputation: 428

Hope this query will allow you to remove duplicates:

DELETE bl1 FROM book_log bl1
JOIN book_log bl2
ON (
    bl1.id > bl2.id AND
    bl1.user_id = bl2.user_id AND
    bl1.book_id = bl2.book_id
);

Demo

Upvotes: 0

Ankit
Ankit

Reputation: 1887

Try following.

SQL SERVER

WITH ORDERED AS
(
    SELECT id
    ROW_NUMBER() OVER (PARTITION BY [user_id] , [book_id] ORDER BY id ASC) AS rn
    FROM
    tableName
)
delete from tableName
where id in ( select id from ORDERED where rn != 1)

MYSQL

delete from tableName
where id not in( 
    select MIN(id)from tableName    
    group by user_id, book_id
)

Edited as per comments - In MySQL, you can't modify the same table which you use in the SELECT part

This will solve the issue.

delete from tableName
where id not in( 
    select temp.temp_id from (
        select MIN(id) as temp_id from tableName    
        group by user_id, book_id
    ) as temp
)

This will keep only one combination of (user_id, book_id)

Upvotes: 1

John Woo
John Woo

Reputation: 263723

If you execute this statement below, it will delete all duplicate records of user_ID and leaving only the greatest ID for each user_ID

DELETE  a
FROM    tableName a
        LEFT JOIN
        (
            SELECT  user_ID, MAX(ID) max_ID
            FROM    tableName
            GROUP   BY user_ID
        ) b ON a.user_ID = b.user_ID AND
                a.ID = b.max_ID
WHERE   b.max_ID IS NULL

Upvotes: 0

Related Questions