Reputation: 1001

API Gateway throttling -- burst limit vs rate limit

I can't find any documentation on these two terms. I pored over AWS docs and Google results.

What is the difference between burst limit and rate limit? When I go to change the settings for default route throttling on my API, there are just two number inputs. It doesn't say what unit or time frame these numbers represent. Is it API calls per second? per minute?

Upvotes: 48

Answers (5)

Ivan Samygin

Reputation: 4581

I'd like to bring an analogy to explain these two limits.

|oooooooooooooooooooooooooooooo|  rate limit - the speed of reloading
|oooooooooooooooooooooooooooooo|  the busket of the ball-gun
|oooooooooooooooooooooooooooooo|  - how fast a ball is added
|oo | no limit in quantity | oo|  to the busket when empty slot
|oo | but throughput limit | oo|  is detected
|----------------------------\ \
               ______________/ / 
               _______________/
          \   /
         >>| | <- an empty slot sensor to request one more ball
           | |
          /   \
         |  o  |  burst limit - the capacity of ball-gun busket
         |ooooo|
        _|_____ \_________
       0_|||||||----------         o
     /||\==|||                                o
      /\   |||

Imagine an entertainment in the kids center - a ball-gun machine. A kid operating the machine has ability to rotate the machine by 360 and pull the trigger to make a shot (it will continue shooting automatically until the trigger is released). The ball-gun has a basket with balls and a sensor on the top of it, so whenever there's an empty slot, a new ball is sent to the basket automatically, but only one-by-one, making sure the ball reached the basket. Basket has capacity of X balls, and the speed of a ball reaching the basket (whenever a sensor detected an empty slot in the basket) is Y balls per period of time (let's say 10 seconds). The shooting speed is amazing - an entire basket can be released in a moment.

Whenever there's at least one ball in the basket the gun can shoot.
Whenever an empty slot is detected by sensor a new ball is sent.
When the basket is full and no shots are made, nothing happens.
When the basket is full and a kid starts shooting without releasing the trigger, all X balls from the basket are shot in a moment, and then a new shot is made every time a ball is in the basket (Y shots every 10 seconds).
When the basket is emptied first, but then shots are done slower than machine reloading speed, the number of balls in the basket keeps growing until the basket is full, and after that a new ball is sent only after a new shot is made.

So burst limit of API gateway is similar to the basket capacity X - how many requests can be accepted at once, when all tokens are available. And the rate limit is similar to Y - how quickly the capacity is reloaded (how many tokens are added in a period of time - a second in case of APIGW - when there's a free slot). A limit rate of 1000 means that a token is added every 1/1000 second (1ms). A new APIGW incoming request corresponds to an event of pulling a trigger: when the basket is empty - no shot is done (a connection is refused by APIGW), when there's a ball in the basket the gun shoots (APIGW accepts connection).

It's important to understand that the number of request that can be accepted in a next period of time depends only on the number of tokens left (how many balls are still in the basket of the ball-gun machine) and the reload speed (how many balls will be added during that next period). This number is completely unrelated to the state of already accepted connection (no additional tokens are added when connection is closed; tokens are being added at a constant speed whenever total token number is less than burst limit). Burst limit and rate limit control only the rate of accepting new connections, not the number of concurrent connections. The X and Y in ball-gun analogy control how fast you can shoot regardless of hitting the target. This article helped me to understand that. If you really need to control the number of opened connections (concurrent requests), you should check other possibilities - they are described in the article as well.

Upvotes: 1

iRestMyCaseYourHonor

Reputation: 809

Lets consider the case where we have 50 burst and 25 rate limit and we have 50 seconds coming each second with 2 seconds of processing time:

At time t0: 50 requests come in, 25 are processed (start to be processed) and the rest are queued up. The bucket has 0 tokens left, as it's been exhausted by the initial burst of requests.

At time t1: 50 more requests come in. However, there are no tokens left in the bucket from the previous second, so these requests are responded with a 429 error. At the same time, the first 25 requests from t0 are still being processed.

At time t2: 50 more requests come in. But the bucket has been replenished with 25 tokens, so it can accommodate 25 new requests, but the remaining 25 requests will get a 429 error. The first 25 requests from t0 have now finished processing, and the bucket starts to refill.

At time t3: 50 more requests come in. The bucket has been replenished by another 25 tokens, so it can again accommodate 25 new requests, but the remaining 25 requests will get a 429 error.

And so on. One needs to take into account the processing time.

Upvotes: 1

sboisse

Reputation: 5548

My understanding of the rate limit and burst limit differs a bit from what is being explained by Tobias Geiselmann (the most upvoted answer).

I don't think there is any concept of concurrency per se in the way throttling works in API Gateway. Requests just get processed as fast as possible and if your API implementation takes long to process a request, there will just be more concurrent processes executing those requests, and the amount of concurrent processes may very well be way more than the limits you would have set for throttling in API Gateway.

The rate limit determines the maximum amount of requests that can be made before the burst starts taking effect, filling up your "burst bucket". The bucket acts like a FIFO, filling up with tokens as requests are coming, and "emptying" itself from those tokens at the rate you have set as the rate limit.

So if more requests keep coming at a faster rate than the "output" of that bucket, then it will eventually become "full", and then throttling will start to happen with "too many requests" errors.

For example, if you set a limit rate of 10 requests per second (RPS), with a burst limit of 100:

If requests keep coming at 10 RPS or lower, the burst bucket just remains empty. Its input and output are below the set rate limit.
Let's now say the amount of requests is beyond 10 RPS:
- The first second, 18 requests come in. The bucket can output 10 RPS, so 18 - 10 = 8 tokens accumulate in the bucket.
- The second second, 34 more requests come in the bucket. The bucket can still take out 10 RPS, so 34 - 10 = 24 more tokens accumulate in the bucket. The bucket now contains 8 + 24 = 32 tokens.
- The third second, 85 more requests are made, and they are added the bucket. Again 10 requests are taken out. This means 85 - 10 = 75 more tokens accumulate in the bucket. But it had already 32 tokens in there. Because 32 + 75 = 107 is higher than 100, the 7 last requests are throttled and get a "Too many requests" response. The bucket is full and contains 100 tokens.
- The fourth second, 5 more requests come in. The bucket can take out 10 tokens, ending up with 100 + 5 - 10 = 95 tokens. No more throttling happens.
- And so on.

So concurrency is not really relevant here. If the requests take 15 seconds each to execute, you could very well end up with 10 RPS * 15 seconds = 150 concurrent requests even if your set limit is just 10 RPS with a burst limit of 100.

Upvotes: 36

Tobias Geiselmann

Reputation: 2496

The burst limit defines the number of requests your API can handle concurrently. The rate limit defines the number of allowed requests per second. This is an implementation of the Token bucket implementation.

Concurrently means that requests run in parallel. Assuming that one request takes 10ms, you could have 100 request per second with a concurrency of 1, if they were all executed in series. But if they were all executed at the same moment, the concurrency would be 100. In both cases a rate limit of 100 would suffice. In the first case, a burst limit of 1 would allow all requests to succeed, in the second case this would deny 99 requests.

The official documentation only mentions the Token bucket algorithm briefly.

Upvotes: 64

Manuel

Reputation: 29

there are three "numbers" to set: Throttling:

Rate: maximum number of requests per second
Burst: maximum number of requests per second in parallel (simultaneously) Quota
Enable Quota: maximum request per month

Upvotes: 2

API Gateway throttling -- burst limit vs rate limit

Answers (5)

Related Questions