Reputation: 70
I have a table with 433,332 records in a MySql Database in Google Cloud. The Table looks something like this
Album_ID | Track_Len | Track_Name | Ft_LName1 | Ft_FName1 | Ft_LName2 | Ft_FName2 | Ft_LName3 | Ft_FName3 | Row_Num |
+---------+-----------+---------------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+---------+
| N40781 | 5.19 | Tumbala (Da Lata Remix) | NULL | Novalima | NULL | NULL | NULL | NULL | 1 |
| N40781 | 5.01 | Ruperta (Zeb Remix) | NULL | Novalima | NULL | NULL | NULL | NULL | 2 |
| N40781 | 6.35 | Coba Guarango (Toni Economides Remix) | NULL | Novalima | NULL | NULL | NULL | NULL | 3 |
| B15033 | 6.02 | II-V-P | Quartet | ARC | NULL | NULL | NULL | NULL | 4 |
| N32395 | 4.47 | My Babe | Stigers | Curtis | NULL | NULL | NULL | NULL | 5 |
| N32395 | 5.13 | Thats All Right | Stigers | Curtis | NULL | NULL | NULL | NULL | 6 |
Note the Primary key of this table is supposed to be (Album_ID, Track_Name) I am aware there are duplicate data entries in the table. I am attempting to run the following query to find and delete these entries:
select count(*)
from Track t1, Track t2
where t1.Album_ID = t2.Album_ID
AND t1.Track_Name = t2.Track_Name
AND t1.Row_Num > t2.Row_Num;
However this is taking an extremely long time to run. Can anyone suggest a more efficient way to do this?
Upvotes: 0
Views: 81
Reputation: 1270773
To get the list of duplicates, I would suggest:
select Album_ID, Track_Name, count(*)
from Track t
group by Album_ID, Track_Name,
having count(*) > 1
If you want the count, then:
select count(*) as num_keys_with_duplicates,
sum(cnt - 1) as num_duplicates
from (select Album_ID, Track_Name, count(*) as cnt
from Track t
group by Album_ID, Track_Name,
having count(*) > 1
) x;
Upvotes: 2