Reputation: 42863
I have table like this
name | personal_number
-----------------------------------------
Jon | 222
Alex | 555
Jon | 222
Jimmy | 999
I need get every name, which personal_number repeates in table more than 1, that is result must be:
Jon
Jon
So, Variant 1):
SELECT name FROM mytable WHERE personal_number IN (
SELECT personal_number FROM mytable GROUP BY personal_number
HAVING COUNT(*) > 1
)
Variant 2):
SELECT personal_number FROM mytable GROUP BY personal_number
HAVING COUNT(*) > 1
)
Then, using php, retrieved personal_numbers join as string (soemthing like this '222', '222'
) and run other query
SELECT name FROM mytable WHERE personal_number IN( here joined string )
Variant 2 works approximately 10 times faster, than variant 1, this is surprise for me, I was thinking that one query will be faster, but...
(In table is 500 000 rows, column personal_number
not indexed)
So, what you mean about cases like this? why variant 2 is many faster than variant 1 ?
Upvotes: 17
Views: 5096
Reputation: 13340
First query has heavy subquery. You must avoid this. The best solution for your problem is only one query:
SELECT name FROM mytable GROUP BY personal_number HAVING COUNT(*) > 1;
This query will return you each repeated name only once. If you want to display the name of the duplicate as many times as they met you must use next query:
SELECT name, COUNT(*) AS count FROM mytable GROUP BY personal_number HAVING COUNT(*) > 1;
And then in PHP do something like this:
foreach ($rows as $row) {
for ($i = 0; $i++; $i < $row['count']) {
echo $row['name'] . "\n";
}
}
Upvotes: 1
Reputation: 17371
It seems that subqueries are very slow as mentioned in this article http://www.mysqlperformanceblog.com/2010/10/25/mysql-limitations-part-3-subqueries.
You should try to avoid having subqueries and use joining instead.
Upvotes: 5
Reputation: 19111
This should be quicker:
SELECT name FROM mytable join (
SELECT personal_number FROM mytable GROUP BY personal_number
HAVING COUNT(*) > 1
)a using (personel_number)
Edit: If this is faster than variant 1, then it means at variant 1 mysql reproduces the inner table for each record again and again.
Upvotes: 0
Reputation: 101
Since indexing is not done so the 1 is slow,as it has to match personal_numbers from selected personal_numbers. If indexing is done it consumes less time than earlier. Variant 2 is a direct query hence its faster.
Upvotes: 0