lost_in
lost_in

Reputation: 75

why sql with 'exists' run slower than 'in' using MySQL

I am a newbie in MySQL optimization, i found a amazing thing:sql with 'exists' run slower than using 'in' !!!

following is my DDL:

mysql> `show create table order\G`;
*************************** 1. row ***************************
       Table: order
Create Table: CREATE TABLE `order` (
  `id` int(4) NOT NULL AUTO_INCREMENT,
  `employee_id` int(4) NOT NULL,
  `price` decimal(7,2) NOT NULL,
  `trade_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  KEY `fk_employee_id` (`employee_id`),
  CONSTRAINT `fk_employee_id` FOREIGN KEY (`employee_id`) REFERENCES `employee` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=231001 DEFAULT CHARSET=utf8



mysql> `show create table order_detail\G`;
*************************** 1. row ***************************
       Table: order_detail
Create Table: CREATE TABLE `order_detail` (
  `id` int(4) NOT NULL AUTO_INCREMENT,
  `menu_id` int(4) NOT NULL,
  `order_id` int(4) NOT NULL,
  `amount` int(4) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `fk_menu_id` (`menu_id`),
  KEY `fk_order_id` (`order_id`),
  CONSTRAINT `fk_menu_id` FOREIGN KEY (`menu_id`) REFERENCES `menu` (`id`),
  CONSTRAINT `fk_order_id` FOREIGN KEY (`order_id`) REFERENCES `order` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1037992 DEFAULT CHARSET=utf8



Query Solution 1: use exists
---------------------------------

mysql> `SELECT count(`order`.id) FROM `order` WHERE EXISTS ( SELECT 1 FROM order_detail WHERE order_detail.order_id = `order`.id GROUP BY (order_detail.order_id) HAVING COUNT(order_id) > 5 );`
+-------------------+
| count(`order`.id) |
+-------------------+
|             92054 |
+-------------------+
1 row in set (***6.53 sec***)

mysql> `explain SELECT count(`order`.id) FROM `order` WHERE EXISTS ( SELECT 1 FROM order_detail WHERE order_detail.order_id = `order`.id GROUP BY (order_detail.order_id) HAVING COUNT(order_id) > 5 )\G;`
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: order
         type: index
possible_keys: NULL
          key: fk_employee_id
      key_len: 4
          ref: NULL
         rows: 231032
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: order_detail
         type: ref
possible_keys: fk_order_id
          key: fk_order_id
      key_len: 4
          ref: performance_test.order.id
         rows: 3
        Extra: Using where; Using index
2 rows in set (0.01 sec)


Query solution 2: use in
------------------------

mysql> `SELECT count(`order`.id) FROM `order` WHERE `order`.id IN ( SELECT order_detail.order_id FROM order_detail GROUP BY (order_detail.order_id) HAVING COUNT(order_id) > 5 ) ;`
+-------------------+
| count(`order`.id) |
+-------------------+
|             92054 |
+-------------------+
1 row in set (***3.88 sec***)

mysql> `explain SELECT count(`order`.id) FROM `order` WHERE `order`.id IN ( SELECT order_detail.order_id FROM order_detail GROUP BY (order_detail.order_id) HAVING COUNT(order_id) > 5 ) \G;`<br>
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: order
         type: index
possible_keys: NULL
          key: fk_employee_id
      key_len: 4
          ref: NULL
         rows: 231032
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 2
  select_type: SUBQUERY
        table: order_detail
         type: index
possible_keys: fk_order_id
          key: fk_order_id
      key_len: 4
          ref: NULL
         rows: 1036314
        Extra: Using index
2 rows in set (0.00 sec)

Upvotes: 4

Views: 3573

Answers (2)

user1133275
user1133275

Reputation: 2735

'EXISTS' can work faster than 'IN' if the parent and sub sets are both large.

  • because O(n*log(n)) is faster than O(n*n) for large sets

'EXISTS' can work slower than 'IN' if the sub set is small.

  • because of query structure and resulting query planner changes; eg 30%(1.5s) slower.

Upvotes: 3

Abhik Dey
Abhik Dey

Reputation: 403

I think you have a little confusion,you are having a wrong idea,'EXISTS' works faster than 'IN' and I'm trying to make you understood the reason..

EXISTS returns a boolean, and will return a boolean on the first match. So if you're dealing with duplicates/multiples, 'EXISTS' will be faster to execute than 'IN' or 'JOIN' depending on the data and the needs.

Whereas, 'IN' is syntactic sugar for OR clauses. While it's very accommodating, there are issues with dealing with lots of values for that comparison (north of 1,000).In case of duplicates/multiples 'IN' checks all the values that exist which naturally consumes more time to execute than 'EXISTS',that's why 'IN' is always comparatively slower than 'EXISTS'.

I hope that I clarified your confusion.

Upvotes: 2

Related Questions