Reputation: 5755
Suppose having a table named contact
with next structure:
id INT -- primary key,autoincrement,index
firstname VARCHAR (255),
lastname VARCHAR(255),
type ENUM
Executing such query:
SELECT c1.id AS c1_id, c2.id AS c2_id
FROM contact c1
INNER JOIN contact c2 ON c1.firstname = c2.firstname AND c1.lastname = c2.lastname
WHERE c1.id <> c2.id AND c1.type=c2.type
is ok on a small ammount of records... but when the record count grows from 30-40 to 1000 this query is very slow.Need to abstract from the record count and speed up this query maximum as possible.Any suggestions?
Upvotes: 1
Views: 75
Reputation: 33945
Except for the "<>", your query's fine (In fact, it may even be fractionally faster than the alternatives put forward!). It's just your indexes that need some work...
CREATE TABLE contact
(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,firstname VARCHAR(255) NOT NULL
,lastname VARCHAR(255) NOT NULL
,type ENUM('small','medium','large') NOT NULL
);
INSERT INTO contact VALUES
(NULL,'John','Brown','small'),
(NULL,'Bill','Red','small'),
(NULL,'Paul','Orange','medium'),
(NULL,'Mike','Green','large'),
(NULL,'John','Scarlet','small'),
(NULL,'John','Cyan','medium'),
(NULL,'Fiona','Brown','large'),
(NULL,'John','Brown','small'),
(NULL,'Chris','Copper','medium'),
(NULL,'Steve','Silver','large');
INSERT INTO contact SELECT NULL,x.firstname, y.lastname, z.type FROM contact x, contact y, contact z;
SELECT COUNT(*) FROM contact;
+----------+
| COUNT(*) |
+----------+
| 1010 |
+----------+
1 row in set (0.01 sec)
SELECT c1.id c1_id
, c2.id c2_id
FROM contact c1
JOIN contact c2
ON c1.firstname = c2.firstname
AND c1.lastname = c2.lastname
AND c1.type=c2.type
WHERE c1.id < c2.id;
...
...
| 1006 | 1008 |
+-------+-------+
5634 rows in set (0.16 sec)
So, now let's add an index on (firstname,lastname,type)...
DROP TABLE IF EXISTS contact;
CREATE TABLE contact
(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,firstname VARCHAR(255) NOT NULL
,lastname VARCHAR(255) NOT NULL
,type ENUM('small','medium','large') NOT NULL
,INDEX(firstname,lastname,type)
);
INSERT INTO contact VALUES
(NULL,'John','Brown','small'),
(NULL,'Bill','Red','small'),
(NULL,'Paul','Orange','medium'),
(NULL,'Mike','Green','large'),
(NULL,'John','Scarlet','small'),
(NULL,'John','Cyan','medium'),
(NULL,'Fiona','Brown','large'),
(NULL,'John','Brown','small'),
(NULL,'Chris','Copper','medium'),
(NULL,'Steve','Silver','large');
INSERT INTO contact SELECT NULL,x.firstname, y.lastname, z.type FROM contact x, contact y, contact z;
SELECT c1.id c1_id
, c2.id c2_id
FROM contact c1
JOIN contact c2
ON c1.firstname = c2.firstname
AND c1.lastname = c2.lastname
AND c1.type = c2.type
AND c1.id < c2.id;
| 775 | 776 |
...
| 1006 | 1008 |
+-------+-------+
5634 rows in set (0.05 sec)
Upvotes: 1
Reputation: 49089
You can try with something like this:
SELECT GROUP_CONCAT(DISTINCT id), firstname, lastname
FROM contact
GROUP BY firstname, lastname
HAVING COUNT(DISTINCT id)>1
this will return all duplicated names. If you want the IDs you could then use a JOIN:
SELECT
contact.id
FROM
contact INNER JOIN (SELECT firstname, lastname
FROM contact
GROUP BY firstname, lastname
HAVING COUNT(DISTINCT id)>1) dup
ON contact.firstname=dup.firstname AND contact.lastname=dup.lastname
Upvotes: 4
Reputation: 4529
Small deviation from fthiella's answer (just incase this is what you need):
SELECT group_concat(id) as ids, firstname, lastname
FROM contact
GROUP BY firstname, lastname
The query above will fill column ids with a comma seperated list of id's for each firstname+lastname combo.
Upvotes: 1