Reputation: 63
I did a MySQL performance optimization test, but the test results surprised me.
First of all, I prepared several tables for my test, which are "t_worker_attendance_300w(3 million data), t_worker_attendance_1000w(10 million data), t_worker_attendance_1y(100 million data), t_worker_attendance_4y(400 million data)".
Each table has the same field, the same index, they are copied, including 400 million data volume is also increased from 3 million data.
In my understanding, MySQL's performance is bound to be severely affected by the size of the data volume, but the results have puzzled me for a whole week. I've almost tested the scenarios I can think of, but their execution times are the same!
This is a new MySQL 5.6.16 server,I tested any scenario I could think of, including INNER JOIN....
A) SHOW CREATE TABLE t_worker_attendance_4y
CREATE TABLE `t_worker_attendance_4y` (
`id` bigint(20) NOT NULL ,
`attendance_id` char(32) NOT NULL,
`worker_id` char(32) NOT NULL,
`subcontractor_id` char(32) NOT NULL ,
`project_id` char(32) NOT NULL ,
`sign_date` date NOT NULL ,
`sign_type` char(2) NOT NULL ,
`latitude` double DEFAULT NULL,
`longitude` double DEFAULT NULL ,
`sign_wages` decimal(16,2) DEFAULT NULL ,
`confirm_wages` decimal(16,2) DEFAULT NULL ,
`work_content` varchar(60) DEFAULT NULL ,
`team_leader_id` char(32) DEFAULT NULL,
`sign_state` char(2) NOT NULL ,
`confirm_date` date DEFAULT NULL ,
`sign_mode` char(2) DEFAULT NULL ,
`checkin_time` datetime DEFAULT NULL ,
`checkout_time` datetime DEFAULT NULL ,
`sign_hours` decimal(6,1) DEFAULT NULL ,
`overtime` decimal(6,1) DEFAULT NULL ,
`confirm_hours` decimal(6,1) DEFAULT NULL ,
`signimg` varchar(200) DEFAULT NULL ,
`signoutimg` varchar(200) DEFAULT NULL ,
`photocheck` char(2) DEFAULT NULL ,
`machine_type` varchar(2) DEFAULT '1' ,
`project_coordinate` text ,
`floor_num` varchar(200) DEFAULT NULL ,
`device_serial_no` varchar(32) DEFAULT NULL ,
KEY `checkin_time` (`checkin_time`),
KEY `worker_id` (`worker_id`),
KEY `project_id` (`project_id`),
KEY `subcontractor_id` (`subcontractor_id`),
KEY `sign_date` (`sign_date`),
KEY `project_id_2` (`project_id`,`sign_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
B) SHOW INDEX FROM t_worker_attendance_4y
+------------------------+------------+------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------------------+------------+------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| t_worker_attendance_4y | 1 | checkin_time | 1 | checkin_time | A | 5017494 | NULL | NULL | YES | BTREE | | |
| t_worker_attendance_4y | 1 | worker_id | 1 | worker_id | A | 1686552 | NULL | NULL | | BTREE | | |
| t_worker_attendance_4y | 1 | project_id | 1 | project_id | A | 102450 | NULL | NULL | | BTREE | | |
| t_worker_attendance_4y | 1 | subcontractor_id | 1 | subcontractor_id | A | 380473 | NULL | NULL | | BTREE | | |
| t_worker_attendance_4y | 1 | sign_date | 1 | sign_date | A | 512643 | NULL | NULL | | BTREE | | |
| t_worker_attendance_4y | 1 | project_id_2 | 1 | project_id | A | 102059 | NULL | NULL | | BTREE | | |
| t_worker_attendance_4y | 1 | project_id_2 | 2 | sign_date | A | 1776104 | NULL | NULL | | BTREE | | |
+------------------------+------------+------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
C) EXPLAIN SELECT SQL_NO_CACHE tw.project_id, tw.sign_date FROM t_worker_attendance_4y tw WHERE tw.project_id = '39235664ba734887b298ee568fbb66fb' AND sign_date >= '07/01/2018' AND sign_date < '08/01/2018' ;
+----+-------------+-------+------+-----------------------------------+--------------+---------+-------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+-----------------------------------+--------------+---------+-------+----------+--------------------------+
| 1 | SIMPLE | tw | ref | project_id,sign_date,project_id_2 | project_id_2 | 96 | const | 54134596 | Using where; Using index |
+----+-------------+-------+------+-----------------------------------+--------------+---------+-------+----------+--------------------------+
They all went through the same joint index.
SELECT tw.project_id, tw.sign_date FROM t_worker_attendance_300w tw
WHERE tw.project_id = '39235664ba734887b298ee568fbb66fb'
AND sgin_date >= '07/01/2018'
AND sgin_date < '08/01/2018' LIMIT 0,10000;
Execution time: 0.02 sec
SELECT tw.project_id, tw.sign_date FROM t_worker_attendance_1000w tw
WHERE tw.project_id = '39235664ba734887b298ee568fbb66fb'
AND sgin_date >= '07/01/2018'
AND sgin_date < '08/01/2018' LIMIT 0,10000;
Execution time: 0.01 sec
SELECT tw.project_id, tw.sign_date FROM t_worker_attendance_1y tw
WHERE tw.project_id = '39235664ba734887b298ee568fbb66fb'
AND sgin_date >= '07/01/2018'
AND sgin_date < '08/01/2018' LIMIT 0,10000;
Execution time: 0.02 sec
SELECT tw.project_id, tw.sign_date FROM t_worker_attendance_4y tw
WHERE tw.project_id = '39235664ba734887b298ee568fbb66fb'
AND sgin_date >= '07/01/2018'
AND sgin_date < '08/01/2018' LIMIT 0,10000;
Execution time: 0.02 sec
......
My guess is that MySQL's query performance will decline dramatically with the increase of data volume, but they are not much different. So I have no way to optimize my query. I don't know when to implement table partition plan or sub-database sub-table plan.
What I want to know is why the execution speed of index with small data volume is the same as that of index with large data volume. If you can help me, I would like to thank you very much.
Upvotes: 2
Views: 124
Reputation: 63
I have a new answer, someone told me "Because your query is covered by index, index is actually the time of query index. Mysql index uses B + tree structure. The query time is basically the same under the same tree height. You can calculate whether the height of the trees indexed by these tables is the same."
So I did the inquiry as required.
mysql> SELECT b.name, a.name, index_id, type, a.space, a.PAGE_NO
-> FROM information_schema.INNODB_SYS_INDEXES a,
-> information_schema.INNODB_SYS_TABLES b
-> WHERE a.table_id = b.table_id AND a.space <> 0;
+-------------------------------------------------+---------------------+----------+------+-------+---------+
| name | name | index_id | type | space | PAGE_NO |
+-------------------------------------------------+---------------------+----------+------+-------+---------+
| mysql/innodb_index_stats | PRIMARY | 18 | 3 | 2 | 3 |
| mysql/innodb_table_stats | PRIMARY | 17 | 3 | 1 | 3 |
| mysql/slave_master_info | PRIMARY | 20 | 3 | 4 | 3 |
| mysql/slave_relay_log_info | PRIMARY | 19 | 3 | 3 | 3 |
| mysql/slave_worker_info | PRIMARY | 21 | 3 | 5 | 3 |
| test_gomeet/t_worker_attendance_1y | GEN_CLUST_INDEX | 45 | 1 | 12 | 3 |
| test_gomeet/t_worker_attendance_1y | checkin_time | 46 | 0 | 12 | 16389 |
| test_gomeet/t_worker_attendance_1y | project_id | 50 | 0 | 12 | 32775 |
| test_gomeet/t_worker_attendance_1y | worker_id | 53 | 0 | 12 | 49161 |
| test_gomeet/t_worker_attendance_1y | subcontractor_id | 54 | 0 | 12 | 65547 |
| test_gomeet/t_worker_attendance_1y | sign_date | 66 | 0 | 12 | 81933 |
| test_gomeet/t_worker_attendance_1y | project_id_2 | 408 | 0 | 12 | 98319 |
| test_gomeet/t_worker_attendance_300w | GEN_CLUST_INDEX | 56 | 1 | 13 | 3 |
| test_gomeet/t_worker_attendance_300w | checkin_time | 58 | 0 | 13 | 16389 |
| test_gomeet/t_worker_attendance_300w | project_id | 59 | 0 | 13 | 16427 |
| test_gomeet/t_worker_attendance_300w | worker_id | 60 | 0 | 13 | 16428 |
| test_gomeet/t_worker_attendance_300w | subcontractor_id | 61 | 0 | 13 | 16429 |
| test_gomeet/t_worker_attendance_300w | sign_date | 67 | 0 | 13 | 65570 |
| test_gomeet/t_worker_attendance_300w | project_id_2 | 397 | 0 | 13 | 81929 |
| test_gomeet/t_worker_attendance_4y | GEN_CLUST_INDEX | 42 | 1 | 9 | 3 |
| test_gomeet/t_worker_attendance_4y | checkin_time | 47 | 0 | 9 | 16389 |
| test_gomeet/t_worker_attendance_4y | worker_id | 49 | 0 | 9 | 32775 |
| test_gomeet/t_worker_attendance_4y | project_id | 52 | 0 | 9 | 49161 |
| test_gomeet/t_worker_attendance_4y | subcontractor_id | 55 | 0 | 9 | 65547 |
| test_gomeet/t_worker_attendance_4y | sign_date | 69 | 0 | 9 | 81933 |
| test_gomeet/t_worker_attendance_4y | project_id_2 | 412 | 0 | 9 | 98319 |
+-------------------------------------------------+---------------------+----------+------+-------+---------+
mysql> SHOW GLOBAL STATUS LIKE 'Innodb_page_size';
+------------------+-------+
| Variable_name | Value |
+------------------+-------+
| Innodb_page_size | 16384 |
+------------------+-------+
root@localhost:/usr/local/mysql/data/test_gomeet# hexdump -s 49216 -n 02 t_worker_attendance_300w.ibd
000c040 0200
000c042
root@localhost:/usr/local/mysql/data/test_gomeet# hexdump -s 49216 -n 02 t_worker_attendance_1y.ibd
000c040 0300
000c042
root@localhost:/usr/local/mysql/data/test_gomeet# hexdump -s 49216 -n 02 t_worker_attendance_4y.ibd
000c040 0300
000c042
The calculation shows that 3.34 is 100 million and 3.589 is 400 million. It's almost the same. Is it because of this?
Upvotes: 0
Reputation: 14501
Same search performance on large data volume because of BTREE index. It has O(log(n))
. Relatively speaking that means that search algorithm have to complete:
6 operations on 3m of data
7 operations on 10m of data
8 operations on 100m of data
8 operations on 400m of data
Аs you can see the number of operations is almost the same.
My guess is that MySQL's query performance will decline dramatically with the increase of data volume
This is true for full table scan cases.
Upvotes: 1