I'm building a web service, consisting of many different components, all of which could conceivably be bottlenecks. I'm currently trying to figure out what metrics I should be looking for, when deciding whether or not my database (on AWS RDS) is the bottleneck in the chain. Looking at AWS Cloudwatch, I see a number of RDS metrics given. Full list: CPUCreditBalance CPUCreditUsage CPUUtilization DatabaseConnections DiskQueueDepth FreeStorageSpace FreeableMemory NetworkReceiveThroughput NetworkTransmitThroughput ReadIOPS ReadLatency ReadThroughput SwapUsage WriteIOPS WriteLatency WriteThroughput The key metrics that I think I should be paying attention to: Read/Write Latency CPU-Utilization Freeable Memory With the latency metrics, I'm thinking that I should set up alerts if it exceeds >300ms (for fast website responsiveness), though I recognize that this is very much workload dependent. With the CPU/memory-util, I have no idea what numbers to set these to. I'm thinking I should set an alert for 75% CPU-utilization, and 75% drop in Freeable Memory. Am I on the right track with the metrics I've shortlisted above, and the thresholds I have guessed? Are there any other metrics I should be paying attention to?

databaseperformancepostgresqlamazon-web-servicesamazon-rds

Reputation: 1164

Determining when to scale up my AWS RDS database?

I'm building a web service, consisting of many different components, all of which could conceivably be bottlenecks. I'm currently trying to figure out what metrics I should be looking for, when deciding whether or not my database (on AWS RDS) is the bottleneck in the chain.

Looking at AWS Cloudwatch, I see a number of RDS metrics given. Full list:

CPUCreditBalance
CPUCreditUsage
CPUUtilization
DatabaseConnections
DiskQueueDepth
FreeStorageSpace
FreeableMemory
NetworkReceiveThroughput
NetworkTransmitThroughput
ReadIOPS
ReadLatency
ReadThroughput
SwapUsage
WriteIOPS
WriteLatency
WriteThroughput

The key metrics that I think I should be paying attention to:

Read/Write Latency
CPU-Utilization
Freeable Memory

With the latency metrics, I'm thinking that I should set up alerts if it exceeds >300ms (for fast website responsiveness), though I recognize that this is very much workload dependent.

With the CPU/memory-util, I have no idea what numbers to set these to. I'm thinking I should set an alert for 75% CPU-utilization, and 75% drop in Freeable Memory.

Am I on the right track with the metrics I've shortlisted above, and the thresholds I have guessed? Are there any other metrics I should be paying attention to?

Upvotes: 1

Answers (2)

E.J. Brennan

Reputation: 46824

I think you are on the right track - especially with the latency metrics; for a typical application with database back-end, the read/write latency is going to be what the user notices most if it degrades. Sure the memory or cpu usage may spike, but does any user care? No, not unless it then causes the latency to go up.

I'd start with the metrics you listed as the low-hanging fruit and adjust accordingly.

Upvotes: 1

John Rotenstein

Reputation: 270076

The answer is totally dependent on your application. Some applications will require more CPU, some will need more RAM. There is no definitive answer.

The best thing is to monitor your database (with the metrics you list above). Then, when performance is below desired, take a look at which metrics are showing problems. These should be the first ones you track for scaling your database.

The key idea that if your customers are experiencing problems, it should be appearing in your metrics somewhere. If this isn't the case, then you're not collecting sufficient metrics.

Upvotes: 1

Determining when to scale up my AWS RDS database?

Answers (2)

Related Questions