Reputation: 5087

Removing duplicate field entries in SQL

Is there anyway I can erase all the duplicate entries from a certain table (users)? Here is a sample of the type of entries I have. I must say the table users consists of 3 fields, ID, user, and pass.

mysql_query("DELETE FROM users WHERE ???") or die(mysql_error());

randomtest
randomtest
randomtest
nextfile
baby
randomtest
dog
anothertest
randomtest
baby
nextfile
dog
anothertest
randomtest
randomtest

I want to be able to find the duplicate entries, and then delete all of the duplicates, and leave one.

Upvotes: 3

Answers (13)

merkushin

Reputation: 481

This will work:

create table tmp like users;
insert into tmp select distinct name from users;
drop table users;
alter table tmp rename users;

Upvotes: 0

Sergi

Reputation: 2892

You can solve it with only one query.

If your table has the following structure:

CREATE TABLE  `users` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `username` varchar(45) NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=8 DEFAULT CHARSET=latin1;

you could do something like that (this will delete all duplicate users based on username with and ID greater than the smaller ID for that username):

DELETE users
  FROM users INNER JOIN
  (SELECT MIN(id) as id, username FROM users GROUP BY username) AS t
  ON users.username = t.username AND users.id > t.id

It works and I've already use something similar to delete duplicates.

Upvotes: 5

homework

Reputation: 5087

Every answer above and/or below didn't work for me, therefore I decided to write my own little script. It's not the best, but it gets the job done.
Comments are included throughout, but this script is customized for my needs, and I hope the idea helps you.

I basically wrote the database contents to a temp file, called the temp file, applied the function to the called file to remove the duplicates, truncated the table, and then input the data right back into the SQL. Sounds like a lot, I know.

If you're confused as to what $setprofile is, it's a session that's created upon logging into my script (to establish a profile), and is cleared upon logging out.

<?php
// session and includes, you know the drill.
session_start();
include_once('connect/config.php');

// create a temp file with session id and current date
$datefile =  date("m-j-Y");
$file = "temp/$setprofile-$datefile.txt";

$f = fopen($file, 'w'); // Open in write mode

// call the user and pass via SQL and write them to $file
$sql = mysql_query("SELECT * FROM _$setprofile ORDER BY user DESC");
while($row = mysql_fetch_array($sql))
{
$user = $row['user'];
$pass = $row['pass'];

$accounts = "$user:$pass "; // the white space right here is important, it defines the separator for the dupe check function
fwrite($f, $accounts);

}
fclose($f);


// **** Dupe Function **** //

// removes duplicate substrings between the seperator
function uniqueStrs($seperator, $str) {
// convert string to an array using ' ' as the seperator
$str_arr = explode($seperator, $str);
// remove duplicate array values
$result = array_unique($str_arr);
// convert array back to string, using ' ' to glue it back
$unique_str = implode(' ', $result);
// return the unique string
return $unique_str;
}

// **** END Dupe Function **** //


// call the list we made earlier, so we can use the function above to remove dupes
$str = file_get_contents($file);
// seperator
$seperator = ' ';
// use the function to save a unique string
$new_str = uniqueStrs($seperator, $str);



// empty the table
mysql_query("TRUNCATE TABLE _$setprofile") or die(mysql_error());

// prep for SQL by replacing test:test with ('test','test'), etc.
// this isn't a sufficient way of converting, as  i said, it works for me.
$patterns = array("/([^\s:]+):([^\s:]+)/", "/\s++\(/");
$replacements = array("('$1', '$2')", ", (");


// insert the values into your table, and presto! no more dupes.
$sql = 'INSERT INTO `_'.$setprofile.'` (`user`, `pass`) VALUES ' . preg_replace($patterns, $replacements, $new_str) . ';';
$product = mysql_query($sql) or die(mysql_error()); // put $new_str here so it will replace new list with SQL formatting

// if all goes well.... OR wrong? :)
if($product){ echo "Completed!";
} else {
echo "Failed!";
}

unlink($file); // delete the temp file/list we made earlier
?>

Upvotes: 0

badbod99

Reputation: 7526

You need to be a bit careful of how the data in your table is used. If this really is a users table, there is likely other tables with FKs pointing to the ID column. In which case you need to update those tables to use ID you have selected to keep.

If it's just a standalone table (no table reference it)

CREATE TEMPORARY TABLE Tmp (ID int);
INSERT INTO Tmp SELECT ID FROM USERS GROUP BY User;
DELETE FROM Users WHERE ID NOT IN (SELECT ID FROM Tmp);

Users table linked from other tables

Create the temporary tables including a link table that holds all the old id's and the respective new ids which other tables should reference instead.

CREATE TEMPORARY TABLE Keep (ID int, User varchar(45));
CREATE TEMPORARY TABLE Remove (OldID int, NewID int);
INSERT INTO Keep SELECT ID, User FROM USERS GROUP BY User;
INSERT INTO Remove SELECT u1.ID, u2.ID FROM Users u1 INNER JOIN Keep u2 ON u2.User = u1.User WHERE u1.ID NOT IN (SELECT ID FROM Users GROUP BY User);

Go through any tables which reference your users table and update their FK column (likely called UserID) to point to the New unique ID which you have selected, like so...

UPDATE MYTABLE t INNER JOIN Remove r ON t.UserID = r.OldID
SET t.UserID = r.NewID;

Finally go back to your users table and remove the no longer referenced duplicates:

DELETE FROM Users WHERE ID NOT IN (SELECT ID FROM Keep);

Clean up those Tmp tables:

DROP TABLE KEEP;
DROP TABLE REMOVE;

Upvotes: 1

Emil Ivanov

Reputation: 37633

I assume that you have a structure like the following:

users
-----------------
| id | username |
-----------------
|  1 | joe      |
|  2 | bob      |
|  3 | jane     |
|  4 | bob      |
|  5 | bob      |
|  6 | jane     |
-----------------

Doing the magic with temporary is required since MySQL cannot use a sub-select in delete query that uses the delete's target table.

CREATE TEMPORARY TABLE IF NOT EXISTS users_to_delete (id INTEGER);

INSERT INTO users_to_delete (id)
    SELECT MIN(u1.id) as id
    FROM users u1
    INNER JOIN users u2 ON u1.username = u2.username
    GROUP BY u1.username;

DELETE FROM users WHERE id NOT IN (SELECT id FROM users_to_delete);

I know the query is a bit hairy but it does the work, even if the users table has more than 2 columns.

Upvotes: 1

RPL

Reputation: 79

Select your 3 columns as per your table structure and apply condition as per your requirements.

SELECT user.userId,user.username user.password FROM user As user GROUP BY user.userId, user.username HAVING (COUNT(user.username) > 1));

Upvotes: 0

cjk

Reputation: 46425

This delete script (SQL Server syntax) should work:

DELETE FROM Users
WHERE ID NOT IN (
    SELECT MIN(ID)
    FROM Users
    GROUP BY User
)

Upvotes: 1

Paul Lammertsma

Reputation: 38252

The temporary table is an excellent solution, but I'd like to provide a SELECT query that grabs duplicate rows from the table as an alternative:

SELECT * FROM `users` LEFT JOIN (
        SELECT `name`, COUNT(`name`) AS `count`
        FROM `users` GROUP BY `name`
    ) AS `grouped`
    WHERE `grouped`.`name` = `users`.`name`
    AND `grouped`.`count`>1

Upvotes: 0

Kristen

Reputation: 4291

If you have a Unique ID / Primary key on the table then:

DELETE FROM MyTable AS T1
WHERE MyID <
(
    SELECT MAX(MyID)
    FROM MyTable AS T2
    WHERE     T2.Col1 = T1.Col1
          AND T2.Col2 = T1.Col2
          ... repeat for all columns to consider duplicates ...
)

if you don't have a Unique Key select all distinct values into a temporary table, delete all original rows, and copy back from temporary table - but this will be problematic if you have Foreign Keys referring to this table

Upvotes: -1

Ferenc Deak

Reputation: 35408

You can do it with three sqls:

create table tmp as select distinct name from users;
drop table users;
alter table tmp rename users;

Upvotes: 5

Tomas

Reputation: 553

I don't know your db schema, but the simplest solution seems to be to do SELECT DISTINCT on that table, keep the result in a variable (i.e. array), delete all records from the table and then reinsert the list returne by SELECT DISTINCT previously.

Upvotes: 0

dotty

Reputation: 41433

I would get all the results, put them in an array of IDs and VALUES. Use a PHP function to work out the dupes, log all the IDs in an array, and use those values to delete the records.

Upvotes: 0

Paul Lammertsma

Reputation: 38252

A very simple solution would be to set an UNIQUE index on the table's column you wish to have unique values. Note that you subsequently cannot insert the same key twice.

Edit: My mistake, I hadn't read that last line: "I want to be able to find the duplicate entries".

Upvotes: 0

Removing duplicate field entries in SQL

Answers (13)

Related Questions