Reputation: 1001
I can't find any documentation on these two terms. I pored over AWS docs and Google results.
What is the difference between burst limit and rate limit? When I go to change the settings for default route throttling on my API, there are just two number inputs. It doesn't say what unit or time frame these numbers represent. Is it API calls per second? per minute?
Upvotes: 48
Views: 42515
Reputation: 4581
I'd like to bring an analogy to explain these two limits.
|oooooooooooooooooooooooooooooo| rate limit - the speed of reloading
|oooooooooooooooooooooooooooooo| the busket of the ball-gun
|oooooooooooooooooooooooooooooo| - how fast a ball is added
|oo | no limit in quantity | oo| to the busket when empty slot
|oo | but throughput limit | oo| is detected
|----------------------------\ \
______________/ /
_______________/
\ /
>>| | <- an empty slot sensor to request one more ball
| |
/ \
| o | burst limit - the capacity of ball-gun busket
|ooooo|
_|_____ \_________
0_|||||||---------- o
/||\==||| o
/\ |||
Imagine an entertainment in the kids center - a ball-gun machine.
A kid operating the machine has ability to rotate the machine by 360
and pull the trigger to make a shot (it will continue shooting automatically until the trigger is released). The ball-gun has a basket with
balls and a sensor on the top of it, so whenever there's an empty slot,
a new ball is sent to the basket automatically, but only one-by-one, making sure the ball reached the basket.
Basket has capacity of X
balls, and the speed of a ball reaching the
basket (whenever a sensor detected an empty slot in the basket) is Y
balls per period of time (let's say 10 seconds).
The shooting speed is amazing - an entire basket can be released in a moment.
X
balls from the basket are shot in a moment, and then a new shot is made every time a ball is in the basket (Y
shots every 10 seconds).So burst limit of API gateway is similar to the basket capacity X
- how many requests can be accepted at once, when all tokens are available.
And the rate limit is similar to Y
- how quickly the capacity is reloaded (how many tokens are added in a period of time - a second in case of APIGW - when there's a free slot). A limit rate of 1000 means that a token is added every 1/1000 second (1ms). A new APIGW incoming request corresponds to an event of pulling a trigger: when the basket is empty - no shot is done (a connection is refused by APIGW), when there's a ball in the basket the gun shoots (APIGW accepts connection).
It's important to understand that the number of request that can be accepted in a next period of time depends only on the number of tokens left (how many balls are still in the basket of the ball-gun machine) and the reload speed (how many balls will be added during that next period). This number is completely unrelated to the state of already accepted connection (no additional tokens are added when connection is closed; tokens are being added at a constant speed whenever total token number is less than burst limit). Burst limit and rate limit control only the rate of accepting new connections, not the number of concurrent connections. The X
and Y
in ball-gun analogy control how fast you can shoot regardless of hitting the target. This article helped me to understand that. If you really need to control the number of opened connections (concurrent requests), you should check other possibilities - they are described in the article as well.
Upvotes: 1
Reputation: 809
Lets consider the case where we have 50 burst and 25 rate limit and we have 50 seconds coming each second with 2 seconds of processing time:
At time t0: 50 requests come in, 25 are processed (start to be processed) and the rest are queued up. The bucket has 0 tokens left, as it's been exhausted by the initial burst of requests.
At time t1: 50 more requests come in. However, there are no tokens left in the bucket from the previous second, so these requests are responded with a 429 error. At the same time, the first 25 requests from t0 are still being processed.
At time t2: 50 more requests come in. But the bucket has been replenished with 25 tokens, so it can accommodate 25 new requests, but the remaining 25 requests will get a 429 error. The first 25 requests from t0 have now finished processing, and the bucket starts to refill.
At time t3: 50 more requests come in. The bucket has been replenished by another 25 tokens, so it can again accommodate 25 new requests, but the remaining 25 requests will get a 429 error.
And so on. One needs to take into account the processing time.
Upvotes: 1
Reputation: 5548
My understanding of the rate limit and burst limit differs a bit from what is being explained by Tobias Geiselmann (the most upvoted answer).
I don't think there is any concept of concurrency per se in the way throttling works in API Gateway. Requests just get processed as fast as possible and if your API implementation takes long to process a request, there will just be more concurrent processes executing those requests, and the amount of concurrent processes may very well be way more than the limits you would have set for throttling in API Gateway.
The rate limit determines the maximum amount of requests that can be made before the burst starts taking effect, filling up your "burst bucket". The bucket acts like a FIFO, filling up with tokens as requests are coming, and "emptying" itself from those tokens at the rate you have set as the rate limit.
So if more requests keep coming at a faster rate than the "output" of that bucket, then it will eventually become "full", and then throttling will start to happen with "too many requests" errors.
For example, if you set a limit rate of 10
requests per second (RPS), with a burst limit of 100
:
If requests keep coming at 10
RPS or lower, the burst bucket just remains empty. Its input and output are below the set rate limit.
Let's now say the amount of requests is beyond 10
RPS:
The first second, 18
requests come in. The bucket can output 10
RPS, so 18 - 10 = 8
tokens accumulate in the bucket.
The second second, 34
more requests come in the bucket. The bucket can still take out 10
RPS, so 34 - 10 = 24
more tokens accumulate in the bucket. The bucket now contains 8 + 24 = 32
tokens.
The third second, 85
more requests are made, and they are added the bucket. Again 10
requests are taken out. This means 85 - 10 = 75
more tokens accumulate in the bucket. But it had already 32
tokens in there. Because 32 + 75 = 107
is higher than 100
, the 7
last requests are throttled and get a "Too many requests" response. The bucket is full and contains 100
tokens.
The fourth second, 5
more requests come in. The bucket can take out 10
tokens, ending up with 100 + 5 - 10 = 95
tokens. No more throttling happens.
And so on.
So concurrency is not really relevant here. If the requests take 15 seconds each to execute, you could very well end up with 10 RPS * 15 seconds = 150 concurrent requests even if your set limit is just 10 RPS with a burst limit of 100.
Upvotes: 36
Reputation: 2496
The burst limit defines the number of requests your API can handle concurrently. The rate limit defines the number of allowed requests per second. This is an implementation of the Token bucket implementation.
Concurrently means that requests run in parallel. Assuming that one request takes 10ms, you could have 100 request per second with a concurrency of 1, if they were all executed in series. But if they were all executed at the same moment, the concurrency would be 100. In both cases a rate limit of 100 would suffice. In the first case, a burst limit of 1 would allow all requests to succeed, in the second case this would deny 99 requests.
The official documentation only mentions the Token bucket algorithm briefly.
Upvotes: 64
Reputation: 29
there are three "numbers" to set: Throttling:
Upvotes: 2