MySQL 8: very slow INSERT…SELECT within the same server for huge partitioned table

Question

Suppose there’re messages (mostly incoming) that get sent/received quite fast: 80–100 per second. Those get written to the table. We have around 300 millions of them by now and that pushes us to make changes to the organization of data. Initially the table structure was quite simple and naive having ID as a PRIMARY KEY and two separate indexes for Arrival and SenderID.

According to different sources, the following structure seems to be appropriate:

CREATE TABLE `messages` (
    `ID` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
    `Arrival` TIMESTAMP NOT NULL,
    `SenderID` INT UNSIGNED NOT NULL,
    `Incoming` TINYINT(1) NOT NULL,
-- Other fields that describe the message skipped
    PRIMARY KEY (`SenderID`, `Arrival`, `ID`) USING BTREE,
    INDEX `ID` (`ID`) USING BTREE,
    INDEX `Arrival_SenderID` (`Arrival`, `SIM`) USING BTREE
)
COLLATE='utf8mb4_unicode_ci'
PARTITION BY RANGE (unix_timestamp(`Arrival`))
(PARTITION past VALUES LESS THAN (1717189200) ENGINE = InnoDB,
 PARTITION p202406 VALUES LESS THAN (1719781200) ENGINE = InnoDB,
-- Partitions in between skipped
 PARTITION p202512 VALUES LESS THAN (1767214800) ENGINE = InnoDB,
 PARTITION future VALUES LESS THAN MAXVALUE ENGINE = InnoDB);

Queries are mostly for messages of a particular SenderID or for all the senders, both cases for a particular time interval. ID is used to ensure uniqueness, index is both required by the AUTO_INCREMENT feature and expected to be used to simplify data copying if we need to some day. Plus, a few other tables use the IDs to reference the records (no CONSTRAINTs involved, just plain values).

This table is the largest one, the rest is pretty small and fast.

We do experiments on a server not used anymore but having the 300 millions table left. Several copies of the original database (except the messages table data) were created and we benchmark different table structures by copying the data from the original database and then measuring query times with real-world queries.

Normally copying those 300 millions of records took 1–2 days in total. But somehow this particular structure causes the copying to be enormously slow. It is done (both for previous experiments and for this one) with queries like these:

INSERT INTO `testdb`.`messages` SELECT * FROM `originaldb`.`messages` WHERE `ID` BETWEEN 1 AND 10000000;
INSERT INTO `testdb`.`messages` SELECT * FROM `originaldb`.`messages` WHERE `ID` BETWEEN 10000001 AND 20000000;
...

Split in chunks to be able to feel the progress and to avoid MySQL exceeding the limits of some buffers (happened when we tried to copy the data in a single such query).

It took around 29 hours for the first chunk. We found bug reports mentioning that innodb_adaptive_hash_index might cause bad performance like that, so copying was stopped, the feature disabled and the whole experiment restarted from the beginning.

The first 2 millions of record took around 10 minutes, but then everything got slow: within the first hour (by now) it hasn’t even copied the 3rd million.

The test server is not used by anything else, has more than 1.5 TB of free space left (the database itself takes around 25–30 GBs only). It’s MySQL 8.0.20.

Any ideas and suggestions are appreciated.

UPD. Not sure if this helps, here’s SHOW ENGINE INNODB STATUS after about 8 millions of rows inserted:

=====================================
2025-02-27 11:51:09 0x1a84 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 44 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 84420 srv_active, 0 srv_shutdown, 2 srv_idle
srv_master_thread log flush and writes: 0
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 4086362
--Thread 2348 has waited at buf0flu.cc line 1635 for 0 seconds the semaphore:
Mutex at 00000215AD29E598, Mutex BUF_POOL_LRU_LIST created buf0buf.cc:1227, lock var 1

--Thread 1252 has waited at buf0buf.cc line 4643 for 0 seconds the semaphore:
Mutex at 00000215AD29E598, Mutex BUF_POOL_LRU_LIST created buf0buf.cc:1227, lock var 1

OS WAIT ARRAY INFO: signal count 1207206
RW-shared spins 24862, rounds 24866, OS waits 4
RW-excl spins 2785, rounds 77298, OS waits 2561
RW-sx spins 310, rounds 9240, OS waits 307
Spin rounds per wait: 1.00 RW-shared, 27.76 RW-excl, 29.81 RW-sx
------------
TRANSACTIONS
------------
Trx id counter 329387481
Purge done for trx's n:o < 329387481 undo n:o < 0 state: running but idle
History list length 0
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 283767129662384, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 283767129660736, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 283767129659912, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 329387336, ACTIVE 85603 sec inserting, thread declared inside InnoDB 3111
mysql tables in use 2, locked 2
25755 lock struct(s), heap size 3743952, 8739195 row lock(s), undo log entries 8713444
MySQL thread id 8, OS thread handle 3208, query id 517 localhost 127.0.0.1 root executing
INSERT INTO `testdb`.`messages` SELECT * FROM `originaldb`.`messages` WHERE `ID` BETWEEN 1 AND 10000000
---TRANSACTION 329380910, ACTIVE 86147 sec recovered trx
ROLLING BACK 1 lock struct(s), heap size 1136, 0 row lock(s), undo log entries 194935309
--------
FILE I/O
--------
I/O thread 0 state: wait Windows aio (insert buffer thread)
I/O thread 1 state: wait Windows aio (log thread)
I/O thread 2 state: wait Windows aio (read thread)
I/O thread 3 state: wait Windows aio (read thread)
I/O thread 4 state: wait Windows aio (read thread)
I/O thread 5 state: wait Windows aio (read thread)
I/O thread 6 state: wait Windows aio (write thread)
I/O thread 7 state: wait Windows aio (write thread)
I/O thread 8 state: complete io for buf page (write thread)
I/O thread 9 state: complete io for buf page (write thread)
Pending normal aio reads: [0, 0, 0, 0] , aio writes: [0, 0, 1, 1] ,
 ibuf aio reads:, log i/o's:, sync i/o's:
Pending flushes (fsync) log: 1; buffer pool: 76
5625836 OS file reads, 13062964 OS file writes, 4442547 OS fsyncs
62.77 reads/s, 16384 avg bytes/read, 150.25 writes/s, 50.25 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 1908, seg size 1910, 0 merges
merged operations:
 insert 0, delete mark 0, delete 0
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 34679, node heap has 0 buffer(s)
Hash table size 34679, node heap has 0 buffer(s)
Hash table size 34679, node heap has 0 buffer(s)
Hash table size 34679, node heap has 0 buffer(s)
Hash table size 34679, node heap has 0 buffer(s)
Hash table size 34679, node heap has 0 buffer(s)
Hash table size 34679, node heap has 0 buffer(s)
Hash table size 34679, node heap has 0 buffer(s)
0.00 hash searches/s, 351.72 non-hash searches/s
---
LOG
---
Log sequence number          518180430405
Log buffer assigned up to    518180430405
Log buffer completed up to   518180430405
Log written up to            518180430405
Log flushed up to            518180429297
Added dirty pages up to      518180430405
Pages flushed up to          518164401302
Last checkpoint at           518164401302
5349965 log i/o's done, 60.36 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 137428992
Dictionary memory allocated 916291
Buffer pool size   8192
Free buffers       2
Database pages     7961
Old database pages 2919
Modified db pages  7038
Pending reads      0
Pending writes: LRU 6, flush list 0, single page 0
Pages made young 1867654, not young 12082459
0.00 youngs/s, 0.00 non-youngs/s
Pages read 5625439, created 91520, written 5881777
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
Buffer pool hit rate 959 / 1000, young-making rate 15 / 1000 not 72 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 7961, unzip_LRU len: 0
I/O sum[5980]:cur[55], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
1 queries inside InnoDB, 0 queries in queue
0 read views open inside InnoDB
Process ID=704, Main thread ID=0000000000000EA8 , state=sleeping
Number of rows inserted 8804050, updated 0, deleted 0, read 8713446
75.04 inserts/s, 0.00 updates/s, 0.00 deletes/s, 75.04 reads/s
Number of system rows inserted 1039, updated 1484, deleted 983, read 477772
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================

UPD2. The original naively indexed table was this:

CREATE TABLE IF NOT EXISTS `Messages` (
  `ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `Arrival` timestamp NOT NULL DEFAULT current_timestamp(),
  `SenderID` int(10) unsigned NOT NULL,
-- Other fields here
  PRIMARY KEY (`ID`),
  KEY `SenderID` (`SenderID`),
  KEY `Arrival` (`Arrival`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Copying performance was acceptable for the target table with the same structure and with same fields but PRIMARY KEY(Arrival, SenderID, ID), KEY(ID) which took 12h40m for 300 millions. But the partitioned target with supposedly optimal indexes is slow.

Number of distinct SenderIDs is around 12000. Amount of RAM is 16 GB, innodb_buffer_pool_size is 128 MB which looks too little (don’t know why this value was chosen originally). But, if that matters, it’s still unclear why the difference in the speed of copying the same data.

Not sure if this could affect performance, but there were a few forced terminations of MySQL: (1) due to lack of disk space, (2) due to lock buffer overflow, (3) to stop the unacceptably slow copying. But the first two happened before all the attempts to copy started, including those which showed good performance.

MySQL 8: very slow INSERT…SELECT within the same server for huge partitioned table

Answers (1)

Related Questions