Reputation: 6181
I have published my website on Amazon EC2 (Singapore region) and I have used MySQL RDS medium instance for data storage in the same region.
In my case, most of the select queries have some COUNT functionality. These queries are showing very slow results. I have already created appropriate indexes on the table and I checked the EXPLAIN command to analyze these queries. It shows me that full table scans are necessary to get results.
On my RDS medium instance, I have configured the custom parameter group with the following settings.
log_queries_not_using_index = true,
slow_query_log = true,
long_query_time = 2 sec,
max_connections = 303,
innodb_buffer_pool_size = {DBInstanceClassMemory*3/4}
Yesterday my CPU utilization went above 95% and my site crashed due to this. There was no major increase in traffic.
Also, I dumped the data on my local system, and tested one of the COUNT queries. While it takes about 1.5 seconds for it to run on RDS, it takes only about 400 milliseconds for it to run on my local system. The configuration on my local system (4GB RAM, Intel core 2 duo 2.8GHz) is:
max_connections = 100,
slow_query_log = true,
long_query_time = 2 sec,
innodb_buffer_pool_size = 72351744
So, what could be the reason for the spike in CPU utilization as well as the difference in performance times between RDS and my local system?
Thanks,
Upvotes: 0
Views: 2252
Reputation: 1669
Depending on the table size - the RDS instance uses EBS to store the data - if you're doing a table scan and its going to have to get the data from EBS instead of a locally cached in-memory key and then scan it. So - you're likely seeing the increased lag of the network between the RDS instance where the CPU resides and the EBS data in the SAN. When you do the same query on your local computer the only lag is the disk head seek time.
Then there is the difference between CPU time - an m1.medium has less CPU time (and therefore less opportunity to scan the results) than the core2 duo based on Amazon's definition of EC2 units.
HTH - in general, I'd try to avoid doing COUNT(s) in your queries as this is a terribly inefficient query (as you've seen) which can and will continue to cause nasty undesired results when the DB is under real-time varying levels of load.
R
Upvotes: 1