Reputation: 5087
Is there anyway I can erase all the duplicate entries from a certain table (users
)? Here is a sample of the type of entries I have. I must say the table users
consists of 3 fields, ID
, user
, and pass
.
mysql_query("DELETE FROM users WHERE ???") or die(mysql_error());
randomtest
randomtest
randomtest
nextfile
baby
randomtest
dog
anothertest
randomtest
baby
nextfile
dog
anothertest
randomtest
randomtest
I want to be able to find the duplicate entries, and then delete all of the duplicates, and leave one.
Upvotes: 3
Views: 3488
Reputation: 481
This will work:
create table tmp like users;
insert into tmp select distinct name from users;
drop table users;
alter table tmp rename users;
Upvotes: 0
Reputation: 2892
You can solve it with only one query.
If your table has the following structure:
CREATE TABLE `users` (
`id` int(10) unsigned NOT NULL auto_increment,
`username` varchar(45) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=8 DEFAULT CHARSET=latin1;
you could do something like that (this will delete all duplicate users based on username with and ID greater than the smaller ID for that username):
DELETE users
FROM users INNER JOIN
(SELECT MIN(id) as id, username FROM users GROUP BY username) AS t
ON users.username = t.username AND users.id > t.id
It works and I've already use something similar to delete duplicates.
Upvotes: 5
Reputation: 5087
Every answer above and/or below didn't work for me, therefore I decided to write my own little script. It's not the best, but it gets the job done.
Comments are included throughout, but this script is customized for my needs, and I hope the idea helps you.
I basically wrote the database contents to a temp file, called the temp file, applied the function to the called file to remove the duplicates, truncated the table, and then input the data right back into the SQL. Sounds like a lot, I know.
If you're confused as to what $setprofile
is, it's a session that's created upon logging into my script (to establish a profile), and is cleared upon logging out.
<?php
// session and includes, you know the drill.
session_start();
include_once('connect/config.php');
// create a temp file with session id and current date
$datefile = date("m-j-Y");
$file = "temp/$setprofile-$datefile.txt";
$f = fopen($file, 'w'); // Open in write mode
// call the user and pass via SQL and write them to $file
$sql = mysql_query("SELECT * FROM _$setprofile ORDER BY user DESC");
while($row = mysql_fetch_array($sql))
{
$user = $row['user'];
$pass = $row['pass'];
$accounts = "$user:$pass "; // the white space right here is important, it defines the separator for the dupe check function
fwrite($f, $accounts);
}
fclose($f);
// **** Dupe Function **** //
// removes duplicate substrings between the seperator
function uniqueStrs($seperator, $str) {
// convert string to an array using ' ' as the seperator
$str_arr = explode($seperator, $str);
// remove duplicate array values
$result = array_unique($str_arr);
// convert array back to string, using ' ' to glue it back
$unique_str = implode(' ', $result);
// return the unique string
return $unique_str;
}
// **** END Dupe Function **** //
// call the list we made earlier, so we can use the function above to remove dupes
$str = file_get_contents($file);
// seperator
$seperator = ' ';
// use the function to save a unique string
$new_str = uniqueStrs($seperator, $str);
// empty the table
mysql_query("TRUNCATE TABLE _$setprofile") or die(mysql_error());
// prep for SQL by replacing test:test with ('test','test'), etc.
// this isn't a sufficient way of converting, as i said, it works for me.
$patterns = array("/([^\s:]+):([^\s:]+)/", "/\s++\(/");
$replacements = array("('$1', '$2')", ", (");
// insert the values into your table, and presto! no more dupes.
$sql = 'INSERT INTO `_'.$setprofile.'` (`user`, `pass`) VALUES ' . preg_replace($patterns, $replacements, $new_str) . ';';
$product = mysql_query($sql) or die(mysql_error()); // put $new_str here so it will replace new list with SQL formatting
// if all goes well.... OR wrong? :)
if($product){ echo "Completed!";
} else {
echo "Failed!";
}
unlink($file); // delete the temp file/list we made earlier
?>
Upvotes: 0
Reputation: 7526
You need to be a bit careful of how the data in your table is used. If this really is a users table, there is likely other tables with FKs pointing to the ID column. In which case you need to update those tables to use ID you have selected to keep.
If it's just a standalone table (no table reference it)
CREATE TEMPORARY TABLE Tmp (ID int);
INSERT INTO Tmp SELECT ID FROM USERS GROUP BY User;
DELETE FROM Users WHERE ID NOT IN (SELECT ID FROM Tmp);
Users table linked from other tables
Create the temporary tables including a link table that holds all the old id's and the respective new ids which other tables should reference instead.
CREATE TEMPORARY TABLE Keep (ID int, User varchar(45));
CREATE TEMPORARY TABLE Remove (OldID int, NewID int);
INSERT INTO Keep SELECT ID, User FROM USERS GROUP BY User;
INSERT INTO Remove SELECT u1.ID, u2.ID FROM Users u1 INNER JOIN Keep u2 ON u2.User = u1.User WHERE u1.ID NOT IN (SELECT ID FROM Users GROUP BY User);
Go through any tables which reference your users table and update their FK column (likely called UserID) to point to the New unique ID which you have selected, like so...
UPDATE MYTABLE t INNER JOIN Remove r ON t.UserID = r.OldID
SET t.UserID = r.NewID;
Finally go back to your users table and remove the no longer referenced duplicates:
DELETE FROM Users WHERE ID NOT IN (SELECT ID FROM Keep);
Clean up those Tmp tables:
DROP TABLE KEEP;
DROP TABLE REMOVE;
Upvotes: 1
Reputation: 37633
I assume that you have a structure like the following:
users
-----------------
| id | username |
-----------------
| 1 | joe |
| 2 | bob |
| 3 | jane |
| 4 | bob |
| 5 | bob |
| 6 | jane |
-----------------
Doing the magic with temporary is required since MySQL cannot use a sub-select in delete query that uses the delete's target table.
CREATE TEMPORARY TABLE IF NOT EXISTS users_to_delete (id INTEGER);
INSERT INTO users_to_delete (id)
SELECT MIN(u1.id) as id
FROM users u1
INNER JOIN users u2 ON u1.username = u2.username
GROUP BY u1.username;
DELETE FROM users WHERE id NOT IN (SELECT id FROM users_to_delete);
I know the query is a bit hairy but it does the work, even if the users table has more than 2 columns.
Upvotes: 1
Reputation: 79
Select your 3 columns as per your table structure and apply condition as per your requirements.
SELECT user.userId,user.username user.password FROM user As user GROUP BY user.userId, user.username HAVING (COUNT(user.username) > 1));
Upvotes: 0
Reputation: 46425
This delete script (SQL Server syntax) should work:
DELETE FROM Users
WHERE ID NOT IN (
SELECT MIN(ID)
FROM Users
GROUP BY User
)
Upvotes: 1
Reputation: 38252
The temporary table is an excellent solution, but I'd like to provide a SELECT
query that grabs duplicate rows from the table as an alternative:
SELECT * FROM `users` LEFT JOIN (
SELECT `name`, COUNT(`name`) AS `count`
FROM `users` GROUP BY `name`
) AS `grouped`
WHERE `grouped`.`name` = `users`.`name`
AND `grouped`.`count`>1
Upvotes: 0
Reputation: 4291
If you have a Unique ID / Primary key on the table then:
DELETE FROM MyTable AS T1 WHERE MyID < ( SELECT MAX(MyID) FROM MyTable AS T2 WHERE T2.Col1 = T1.Col1 AND T2.Col2 = T1.Col2 ... repeat for all columns to consider duplicates ... )
if you don't have a Unique Key select all distinct values into a temporary table, delete all original rows, and copy back from temporary table - but this will be problematic if you have Foreign Keys referring to this table
Upvotes: -1
Reputation: 35408
You can do it with three sqls:
create table tmp as select distinct name from users;
drop table users;
alter table tmp rename users;
Upvotes: 5
Reputation: 553
I don't know your db schema, but the simplest solution seems to be to do SELECT DISTINCT
on that table, keep the result in a variable (i.e. array), delete all records from the table and then reinsert the list returne by SELECT DISTINCT
previously.
Upvotes: 0
Reputation: 41433
I would get all the results, put them in an array of IDs and VALUES. Use a PHP function to work out the dupes, log all the IDs in an array, and use those values to delete the records.
Upvotes: 0
Reputation: 38252
A very simple solution would be to set an UNIQUE
index on the table's column you wish to have unique values. Note that you subsequently cannot insert the same key twice.
Edit: My mistake, I hadn't read that last line: "I want to be able to find the duplicate entries".
Upvotes: 0