SQS Cloudwatch Sanity

Question

I'm analyzing a recent load event on my SQS consumer service and am stuck with some SQS Cloudwatch metrics that don't make sense to me. Essentially, it looks like the queue was getting overloaded with messages that aren't accounted for in the metrics. Let me start by summarizing the data in a selected 5 minute period:

ApproximateNumberOfMessagesVisible: 215,686 -> 233,605 (gain of 17,919 for this period)
ApproximateNumberOfMessagesNotVisible: 2,239 -> 2,129 (loss of 110 for this period)
NumberOfMessagesSent: 31,441
NumberOfMessagesDeleted: 24,665

What is baffling me is that the ApproximateNumberOfMessagesVisible is experiencing a gain (+17k) of many times more than the number of messages that were not processed (NumberOfMessagesSent - NumberOfMessagesDeleted = ~6k).

I've included metrics about the number of invisible messages as well (just incase there was a bunch of invisible messages that suddenly became visible), but that doesn't seem to be the case.

How could this be possible?

SQS Cloudwatch Sanity

Answers (1)

Related Questions