Simon Ninon
Simon Ninon

Reputation: 2451

RDS MySQL Insights: Top Query "commit"

We are trying to reduce the load on our AWS RDS instance by going through the performance insights.

The top SQL statement listed in the insight is "commit":

top sql statements affecting the database load

In particular, the commit statement spends most of its time in the "init" state:

wait states for commit statements

When looking at the overall load by states, this "init" state indeed constitutes a large chunk of the load:

overall database load by wait states

So, now, we are not too sure what to derive from this information.

  1. First, we have autocommit enabled. Does this "COMMIT" statement includes the automatic COMMIT statement from the autocommit configuration, or is it only exclusive to explicit COMMIT statements when our application is performing transactions?
  2. I checked more, and it seems this "init" state is related to the database flushing the binlog to disk (https://dev.mysql.com/doc/refman/8.0/en/general-thread-states.html). That could make sense since we have sync_binlog=1 on our master. But is this guess correct, or is there more that could be involved?
  3. If that's the case, I read about tuning sync_binlog (e.g. sync_binlog=0, or sync_binlog>1), but with greater risk in case of hardware failure (https://aws.amazon.com/blogs/database/best-practices-for-configuring-parameters-for-amazon-rds-for-mysql-part-2-parameters-related-to-replication/). For this part, I am unclear what is a good practice when scaling RDS with one master and some read-replicas: is it a standard practice to touch this setting on the master, or should we focus more on reducing the amount of transactions we are performing from the application?
  4. I was able to see these "COMMIT" statements stuck in init state when running SHOW FULL PROCESSLIST. However, I'm wondering if there is a way to see what's the querie(s) related to that COMMIT statement. Otherwise, seems we have to guess what transaction is the root cause.

Upvotes: 11

Views: 6209

Answers (1)

Simon Ninon
Simon Ninon

Reputation: 2451

We finally were able to tackle this "commit" load issue.

A few facts about our system:

  • About 95% of our SQL queries are reads
  • About 5% of our write queries are writes
  • Most of our queries are done with auto-commit, not with an explicit transaction

After investigating further, our findings are the followings:

  • The "commit" load from the performance insight comes from the time it takes to write the query to the binlog, on the disk
  • Because most of our write queries are performed with auto-commit, all of these queries perform a commit, triggering some binlog flush
  • The read queries do not increase this "commit" load, most likely because they don't need to write to the binlog
  • Thus, the 5% of queries performing writes were causing most of the database load!

For us, the most effective way to fix it is to batch writes together within a single transaction. Many of the writes we are doing do not need to be written to the DB immediately, and it may be OK to lose some of them (most of our writes are related to stats or tracking). Thus, our current solution is to store our stats/tracking writes into the cache and flush them to DB once in a while within a single transaction. This reduces the number of commits, effectively reducing the load.

This solution may not work for everyone or may need some tuning. From my understanding, there are some alternative solutions like:

  • Shard your database and move the tables causing a lot of write loads to another instance
  • Reduce the number of write queries
  • Play with the binlog flush settings. By default, it would flush the binlog to disk after each COMMIT, but it can be configured differently. However, this can introduce some recoverability issues if the database crashses!
  • Use a faster disk

I also answered my original questions below.

Hopefully, this can be helpful to somebody else. I found that there is very little information online related to this.

First, we have autocommit enabled. Does this "COMMIT" statement includes the automatic COMMIT statement from the autocommit configuration, or is it only exclusive to explicit COMMIT statements when our application is performing transactions?

This "COMMIT" statement includes queries performed with autocommit.

Also, in our context, none of the read queries were included in this "COMMIT" statement, most likely because no binlog flush to disk was required after read queries.

I checked more, and it seems this "init" state is related to the database flushing the binlog to disk (https://dev.mysql.com/doc/refman/8.0/en/general-thread-states.html). That could make sense since we have sync_binlog=1 on our master. But is this guess correct, or is there more that could be involved?

For us, only the binlog flush was causing the high load during this init state.

If that's the case, I read about tuning sync_binlog (e.g. sync_binlog=0, or sync_binlog>1), but with greater risk in case of hardware failure (https://aws.amazon.com/blogs/database/best-practices-for-configuring-parameters-for-amazon-rds-for-mysql-part-2-parameters-related-to-replication/). For this part, I am unclear what is a good practice when scaling RDS with one master and some read-replicas: is it a standard practice to touch this setting on the master, or should we focus more on reducing the amount of transactions we are performing from the application?

The standard practice is to keep sync_binlog=1 on the master to avoid recoverability problems after a crash, especially if you have read-replicas configured.

There are other ways to deal with this problem as listed above in my answer.

I was able to see these "COMMIT" statements stuck in init state when running SHOW FULL PROCESSLIST. However, I'm wondering if there is a way to see what's the querie(s) related to that COMMIT statement. Otherwise, seems we have to guess what transaction is the root cause.

It's doable as described in this blog post: https://www.psce.com/en/blog/2015/01/22/tracking-mysql-query-history-in-long-running-transactions/

However, it needs the events_statements_history to be enabled. On RDS this is disabled and don't allow to enable it as far as I know.

Upvotes: 16

Related Questions