Reputation: 1117
I have mysql 5.6.36 database where the size is ~35G running on CentOS 7.3 with 48G of RAM.
[UPDATE 17-08-06] I will update relevant information here.
I am seeing that my server runs out of memory and crashes even with ~48G of RAM. I could not keep it running on 24G, for example. A DB this size should be able to run on much less. Clearly, I a missing something fundamental.
[UPDATE: 17-08-05] By crashes, I mean mysqld stops and restarts with no useful information in the log, other than restarting from a crash. Also, with all this memory, I got this error during recovery:
[ERROR] InnoDB: space header page consists of zero bytes in tablespace ./ca_uim/t_qos_snapshot.ibd (table ca_uim/t_qos_snapshot)
The relevant portion of my config file looks like this [EDITED 17-08-05 to add missing lines]:
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
lower_case_table_names = 1
symbolic-links=0
sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES
max_allowed_packet = 32M
max_connections = 300
table_definition_cache=2000
innodb_buffer_pool_size = 18G
innodb_buffer_pool_instances = 9
innodb_log_file_size = 1G
innodb_file_per_table=1
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
It was an oversight to use file per table, and I need to change that (I have 6000 tables, and most of those are partitioned).
After running for a short while (one hour), mytop shows this:
MySQL on 10.238.40.209 (5.6.36) load 0.95 1.08 1.01 1/1003 8525 up 0+01:31:01 [17:44:39]
Queries: 1.5M qps: 283 Slow: 22.0 Se/In/Up/De(%): 50/07/09/01
Sorts: 27 qps now: 706 Slow qps: 0.0 Threads: 118 ( 3/ 2) 43/28/01/00
Key Efficiency: 100.0% Bps in/out: 76.7k/176.8k Now in/out: 144.3k/292.1k
And free shows this:
# free -h
total used free shared buff/cache available
Mem: 47G 40G 1.5G 8.1M 5.1G 6.1G
Swap: 3.9G 508K 3.9G
Top shows this:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2010 mysql 20 0 45.624g 0.039t 9008 S 95.0 84.4 62:31.93 mysqld
How can this be? Is this related file per table? The entire DB could fit in memory. What am I doing wrong?
Upvotes: 24
Views: 40795
Reputation: 2343
Use of www.mysqlcalculator.com would be a quick way to get a brain check on about a dozen memory consumption factors in less than 2 minutes.
118 active threads may be reasonable but would seem to be causing extreme context switching trying to answer 118 questions simultaneously.
Would love to see your SHOW GLOBAL STATUS and SHOW GLOBAL VARIABLES, if you could get them posted.
Upvotes: 0
Reputation: 571
In your my.cnf (MySQL configuration) file:
Add a setting in [mysqld] block
[mysqld]
performance_schema = 0
For MySQL 5.7.8 onwards, you will have to add extra settings as below:
[mysqld]
performance_schema = 0
show_compatibility_56 = 1
NOTE: This will cut your Memory usage to more than 50%-60%. "show_compatibility_56" is optional, for some cases it works, better to check it once added to the config file.
Upvotes: 26
Reputation: 2343
Hoping you are a believer in ONLY one change at a time so you can track progress for a configuration reason. 2017-08-07 about 17:00 SHOW GLOBAL VARIABLES indicates innodb_buffer_pool_size
is 128M. Change in my.cnf
to 24G, shutdown/restart when permitted, please.
A) max_allowed_packet_size
at 1G is likely what you meant in your configuration, considering on 8/7/2017 your remote agents are sending 1G packets for processing on this equipment. How are remote agents managed in terms of scheduling their sending of data to prevent exhausting all 48G on this host for this single use of memory? Status indicates bytes_received
on 8/6/2017 was 885,485,832 from max_used_connections
of 86 in first 1520 seconds of uptime.
B) innodb_io_capacity
at 200 is likely a significant throttle to your possible IOPS, we run here at 700. sqlio.exe utility was used to guide us in this direction.
C) innodb_io_capacity_max
should be likely be adjusted as well.
D) thread_cache_size
of 11, consider going to 128.
E) thread_concurrency
of 10, consider going to 30.
F) I understand the length of process-list.txt
in the number of Sleep ID's is likely caused by the use of persistent connections. The connection is just waiting for some additional activity from the client for an extended period of time. 8/8/2017
G) STATUS Com_begin
count is usually very close to Com_commit
count, not in your case. 8/8/2017 Com_begin
was 2 and Com_commit
was 709,910 for 11 hours of uptime.
H) It would probably be helpful to see just 3 minutes of a General Log, if possible.
Keep me posted on your progress.
Upvotes: 0
Reputation: 1117
Well, I resolved the issue. I appreciate all the insight from those who responded. The solution is very strange, and I cannot explain why this solves the problem, but it does. What I did was add the following line to my.cnf:
log_bin
You may, in addition, need to add the following:
expire_logs_days = <some number>
We have seen at least one instance where the logs accumulated and filled up a disk. The default is 0 (no auto removal). https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_expire_logs_days
Upvotes: 8
Reputation: 1317
I would check table_open_cache. You have a lot of tables and it is clearly reflected in avg opened files per second: about 48 when a normal value is between 1 and 5.
That is confirmed by the values of Table_open_cache_misses
and Table_open_cache_overflows
,
ideally those values should be cero. Those means failed attempts to use cache and in consequence wasted memory.
You should try increasing it at least to 3000 and see results.
Since you are on CentOS:
ulimit
it is unlimited or about 20000 for your 6000 tables.swappiness
to 1. I think it is better to have some swapps (while observing) than crashes.Upvotes: 1
Reputation: 2343
Please enable the MySQL error log in your usual configuration. When MySQL crashes, protect the error log before restarting, and add last error-log available to your Question, please. It should have a clue WHY MySQL is failing. Running the 'small' configuration will run like a dog, when supporting the volume of activity reported by SHOW GLOBAL STATUS. Please get back to your usual production configuration. I am looking at your provided details and will have some tuning suggestions in next 24 hours. It appears most of the process-list activities are related to replication. Would that be true?
Upvotes: 0
Reputation: 2327
Results are stored and fed from memory and given that you're running 283 per second, there's probably a lot of data at any given moment being dished out.
I would think that you are doing a good job squeezing a lot out of that server. Consider the tables are one thing, then the schema involved for 6000 tables, plus the fact that you're pulling 283 queries per second against a 35 GB database and that those results are held in memory while they are being served. The rest of us might as well learn from you.
Regarding the stopping and restarting of MySQL
[ERROR] InnoDB: space header page consists of zero bytes in tablespace ./ca_uim/t_qos_snapshot.ibd (table ca_uim/t_qos_snapshot)
Your might consider trying
innodb_flush_method=normal
which is recommended here and here, but I can't promise it will work.
Upvotes: 1