Reputation: 75
I am a newbie in MySQL optimization, i found a amazing thing:sql with 'exists' run slower than using 'in' !!!
mysql> `show create table order\G`;
*************************** 1. row ***************************
Table: order
Create Table: CREATE TABLE `order` (
`id` int(4) NOT NULL AUTO_INCREMENT,
`employee_id` int(4) NOT NULL,
`price` decimal(7,2) NOT NULL,
`trade_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `fk_employee_id` (`employee_id`),
CONSTRAINT `fk_employee_id` FOREIGN KEY (`employee_id`) REFERENCES `employee` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=231001 DEFAULT CHARSET=utf8
mysql> `show create table order_detail\G`;
*************************** 1. row ***************************
Table: order_detail
Create Table: CREATE TABLE `order_detail` (
`id` int(4) NOT NULL AUTO_INCREMENT,
`menu_id` int(4) NOT NULL,
`order_id` int(4) NOT NULL,
`amount` int(4) NOT NULL,
PRIMARY KEY (`id`),
KEY `fk_menu_id` (`menu_id`),
KEY `fk_order_id` (`order_id`),
CONSTRAINT `fk_menu_id` FOREIGN KEY (`menu_id`) REFERENCES `menu` (`id`),
CONSTRAINT `fk_order_id` FOREIGN KEY (`order_id`) REFERENCES `order` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1037992 DEFAULT CHARSET=utf8
Query Solution 1: use exists
---------------------------------
mysql> `SELECT count(`order`.id) FROM `order` WHERE EXISTS ( SELECT 1 FROM order_detail WHERE order_detail.order_id = `order`.id GROUP BY (order_detail.order_id) HAVING COUNT(order_id) > 5 );`
+-------------------+
| count(`order`.id) |
+-------------------+
| 92054 |
+-------------------+
1 row in set (***6.53 sec***)
mysql> `explain SELECT count(`order`.id) FROM `order` WHERE EXISTS ( SELECT 1 FROM order_detail WHERE order_detail.order_id = `order`.id GROUP BY (order_detail.order_id) HAVING COUNT(order_id) > 5 )\G;`
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: order
type: index
possible_keys: NULL
key: fk_employee_id
key_len: 4
ref: NULL
rows: 231032
Extra: Using where; Using index
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: order_detail
type: ref
possible_keys: fk_order_id
key: fk_order_id
key_len: 4
ref: performance_test.order.id
rows: 3
Extra: Using where; Using index
2 rows in set (0.01 sec)
Query solution 2: use in
------------------------
mysql> `SELECT count(`order`.id) FROM `order` WHERE `order`.id IN ( SELECT order_detail.order_id FROM order_detail GROUP BY (order_detail.order_id) HAVING COUNT(order_id) > 5 ) ;`
+-------------------+
| count(`order`.id) |
+-------------------+
| 92054 |
+-------------------+
1 row in set (***3.88 sec***)
mysql> `explain SELECT count(`order`.id) FROM `order` WHERE `order`.id IN ( SELECT order_detail.order_id FROM order_detail GROUP BY (order_detail.order_id) HAVING COUNT(order_id) > 5 ) \G;`<br>
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: order
type: index
possible_keys: NULL
key: fk_employee_id
key_len: 4
ref: NULL
rows: 231032
Extra: Using where; Using index
*************************** 2. row ***************************
id: 2
select_type: SUBQUERY
table: order_detail
type: index
possible_keys: fk_order_id
key: fk_order_id
key_len: 4
ref: NULL
rows: 1036314
Extra: Using index
2 rows in set (0.00 sec)
Upvotes: 4
Views: 3573
Reputation: 2735
'EXISTS' can work faster than 'IN' if the parent and sub sets are both large.
'EXISTS' can work slower than 'IN' if the sub set is small.
Upvotes: 3
Reputation: 403
I think you have a little confusion,you are having a wrong idea,'EXISTS' works faster than 'IN' and I'm trying to make you understood the reason..
EXISTS returns a boolean, and will return a boolean on the first match. So if you're dealing with duplicates/multiples, 'EXISTS' will be faster to execute than 'IN' or 'JOIN' depending on the data and the needs.
Whereas, 'IN' is syntactic sugar for OR clauses. While it's very accommodating, there are issues with dealing with lots of values for that comparison (north of 1,000).In case of duplicates/multiples 'IN' checks all the values that exist which naturally consumes more time to execute than 'EXISTS',that's why 'IN' is always comparatively slower than 'EXISTS'.
I hope that I clarified your confusion.
Upvotes: 2