Reputation: 337
I use 5.7.16-10-log Percona Server on Ubuntu 16.04.1 LTS.
I recently upgraded my AWS ec2 instance to m4.4xlarge from m4.2xlarge instance type. Post the upgrade my mysql error log is periodically filled up with the following lines:
2017-09-15T17:10:30.678258+05:30 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4460ms. The settings might not be optimal. (flushed=140, during the time.)
2017-09-15T17:14:31.048443+05:30 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 6047ms. The settings might not be optimal. (flushed=107, during the time.)
2017-09-15T17:15:06.312119+05:30 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 8262ms. The settings might not be optimal. (flushed=64, during the time.)
These are frequent when Percona Innobackupex runs as per schedule for full and incremental backups. At such times here is the disk usage as per iostat:
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
xvda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
xvdf 0.00 10.00 0.00 2.00 0.00 48.00 48.00 0.00 0.00 0.00 0.00 0.00 0.00
xvdb 21.00 0.00 672.00 816.00 63328.00 104448.00 225.51 30.65 20.74 19.28 21.94 0.67 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
xvda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
xvdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
xvdb 0.00 4.95 630.69 802.97 62875.25 102677.23 230.95 30.27 21.11 19.80 22.14 0.69 99.01
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
xvda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
xvdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
xvdb 23.00 153.00 988.00 668.00 94888.00 72104.00 201.68 26.84 16.21 14.57 18.62 0.59 98.40
I keep getting these messages in the mysql error log even after lowering down the lru scan depth from the default of 1024 which I found being recommended on some of the similar posts on the web.
SET GLOBAL innodb_lru_scan_depth=256;
As per someone recommending, I also tried to find out if the problem is with the LRU list or the flush list instead by using:
SHOW ENGINE INNODB STATUS\G
In the output below, on some occasions, I was able to see flush list greater than zero in Pending writes:
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 50300190720
Dictionary memory allocated 18819582
Internal hash tables (constant factor + variable factor)
Adaptive hash index 3805993024 (764908736 + 3041084288)
Page hash 5976584 (buffer pool 0 only)
Dictionary cache 210046766 (191227184 + 18819582)
File system 1570552 (812272 + 758280)
Lock system 119847848 (119530904 + 316944)
Recovery system 0 (0 + 0)
Buffer pool size 2948760
Buffer pool size, bytes 0
Free buffers 417019
Database pages 2346130
Old database pages 866201
Modified db pages 60127
Pending reads 0
Pending writes: LRU 0, flush list 10, single page 0
Pages made young 5082, not young 0
0.17 youngs/s, 0.00 non-youngs/s
Pages read 2139505, created 206625, written 1680831
1323.32 reads/s, 0.70 creates/s, 21.87 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 1314.62/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 2346130, unzip_LRU len: 0
I/O sum[0]:cur[128], unzip sum[0]:cur[0]
Here's a snippet of the my.cnf that I use:
[mysqld]
# GENERAL #
user = mysql
default-storage-engine = InnoDB
socket = /var/run/mysqld/mysqld.sock
pid-file = /var/run/mysqld/mysqld.pid
basedir = /usr
tmpdir = /var/tmp
lc-messages-dir = /usr/share/mysql
explicit_defaults_for_timestamp
log_timestamps = SYSTEM
skip-name-resolve
net-write-timeout = 600
net-read-timeout = 600
innodb-page-cleaners = 8
performance-schema = OFF
# MyISAM #
key-buffer-size = 32M
# SAFETY #
max-allowed-packet = 16M
max-connect-errors = 1000000
sysdate-is-now = 1
# DATA STORAGE #
datadir = /var/lib/mysql/
# BINARY LOGGING #
server-id = 2
log-bin = /var/lib/mysql/mysql-bin
expire-logs-days = 14
sync-binlog = 1
max_binlog_size = 1G
max_binlog_files = 20
# CACHES AND LIMITS #
tmp-table-size = 32M
max-heap-table-size = 32M
query-cache-type = 0
query-cache-size = 0
max-connections = 500
thread-cache-size = 50
open-files-limit = 4510
table-definition-cache = 4096
table-open-cache = 4096
# INNODB #
innodb-flush-method = O_DIRECT
innodb-log-files-in-group = 2
innodb-log-file-size = 512M
innodb-flush-log-at-trx-commit = 1
innodb-file-per-table = 1
innodb-buffer-pool-size = 45G
innodb-print-all-deadlocks = ON
# LOGGING #
log-error = /var/log/mysql/mysql-error.log
slow-query-log-file = /var/log/mysql/mysql-slow.log
lower-case-table-names = 1
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_general_ci
#Slow query queries
slow-query-log = 1
long-query-time = 10
slow-query-log-always-write-time = 15 #Slow query queries
log-slow-verbosity = full
log-slow-rate-type = query
log-slow-rate-limit = 100 #queries logged/sec
log-queries-not-using-indexes = 1
Any advice/help/indicator to troubleshoot and resolve this issue would be of great help.
Here's what limit -a looks like on my production:
$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 257583
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 257583
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Upvotes: 2
Views: 9471
Reputation: 2343
Things to do in my.cnf/ini
[mysqld] section.
Move innodb_page_cleaners
to be in the section # INNODB # and change to
innodb_page_cleaners=64 # will autolimit to be = innodb_buffer_pool_instances
Add a line
innodb_buffer_pool_instances=8 # from default of 1 to reduce mutex contention
Please POST in original question results of ulimit -a
so we can see the Ubuntu limits.
Upvotes: 0
Reputation: 2343
Your open files are limited to 1024, raise it to 90000. The OS cost per file is next to nothing in RAM. Verify your change, when implemented. Then things to do in your configuration section only 1 change, monitor if no negative impact, one more change the next working day, please. Many can be dynamically changed, see user guide.
thread_cache_size=100 # from 50 to minimize
threads_created
open_files_limit=30000 # from 4510
table_definition_cache=10000 # from 4096
table_open_cache=10000 # from 4096
These few days worth of modification/monitoring will improve your instance.
ON ANOTHER DAY (or for an hour) for a single focus of queries not using indexes
slow_query_log=1 # for ON
min_examined_row_limit=1 # to eliminate useless chatter
log_queries_not_using_indexes=1
long_query_time=5000 # for 5000 seconds which will be RARE
to avoid a mixed bag of slow query and queries not using indexes information
When you are finished with this single focus,
min_examined_row_limit=0 for useless chatter in the slow query log
log_queries_not_using_indexes=0
long_query_time=10 or less depending on your need to monitor.
for further analysis, How much RAM do you have, are you using SSD or rotating hard drives? Please post TEXT results of A) SHOW GLOBAL STATUS; B) SHOW GLOBAL VARIABLES;
Upvotes: 1