Reputation: 866
I have this sql query, which selects duplicates it finds in the table by movie_name:
SQL:
SELECT movies.movie_name, movies.year FROM movies
INNER JOIN (SELECT movie_name FROM movies
GROUP BY movie_name HAVING count(movie_id) > 1) dup ON movies.movie_name = dup.movie_name
// want also to test for same year, not just movie_name i.e movies.year = dup.year
is this possible?
Upvotes: 0
Views: 180
Reputation: 575
I really like Tony's idea to use MIN
. However, whole query can be simplier:
DELETE FROM movies WHERE movie_id NOT IN
(SELECT MIN(movie_id) FROM movies GROUP BY movie_name, year);
Upvotes: 0
Reputation: 20320
SELECT movies.movie_name, movies.year FROM movies
INNER JOIN (SELECT movie_name, year FROM movies
GROUP BY movie_name,year HAVING count(movie_id) > 1) dup ON movies.movie_name = dup.movie_name
and movies.year = dup.year
would seem to be a reasonable start...
By deleting one I assume you mean keeping one, don't forget you could have more than one duplicate
Lets say we'll keep the first one get rid of the rest, and the one with the earliest movie_id is the first
So
Select Min(Movie_id), Movie_Name, Year From Movies Group By Movie_Name, Year
would give you all the ones to keep
Select Movie_id,Movie_Name,Year From Movies m
Left Join
(Select Min(Movie_id), Movie_Name, Year From Movies Group By Movie_Name, Year) keep
On keep.movieid = m.movieid Where keep.Movie_Id is null
Above is all those records which are in movies but not in the query of all those we want to keep.
So that gives us all the ones you want to get rid of. For Cthulhu's sake don't trust me or yourself on this! Take a back up before you do the delete!
Delete m From Movies m
Left Join
(Select Min(Movie_id), Movie_Name, Year From Movies Group By Movie_Name, Year) keep
On keep.movieid = m.movieid Where keep.Movie_Id is null
So now we've took the query we proved (you did prove it didn't you! ) and instead of selecting the offending records we are deleting them.
Don't forget the back up!
Upvotes: 1