Reputation: 5592
We have an old web service that returns some data through an API. That service is very slow and one should not execute a lot of questions through the API at the same time. Now a new Azure web app has been developed by another part of the company. They are calling this slow API and didn't realize how bad it was until we had a meeting about why the database server almost went down. The API sometimes need 4-8 seconds to reply which is not optimal if it's called 50 times by a lot of consumers.
I added Azure API Manager between the old API and the new Azure Web app. My hope was to be able to limit the calls to the backend and then use the cache to reduce the number of calls. However, this new service requires "fast data" and want to call the backend at least every 10 seconds. I would like to avoid having the Azure API Manager call the backend more then what's needed which translates to - same questions = cache for 10 seconds then fetch from the backend, new parameter = fetch from the backend and then cache for 10 seconds.
The cache works but every 10 seconds i end up with a lot of "simultaneous" calls through the Azure API Manager which makes the life hard for the old API.
Is it possible to only allow one call (per Authorization and parameter) down to the API and then cache the result? Meaning that, if 5 users ask the API Manager the same question at the same time, the first one is sent to the backend, returns a response that is cached while the remaining 4 is queued up waiting for the cached copy to be finished? I would rather have then in a retry loop for a few seconds than return a big too many calls error.
Ex.
I have tried a few different rule variations like trying the limit-concurrency inside the retry but all results in multiple calls to the backend until the first one returns and is cached.
<policies>
<inbound>
<base />
<cache-lookup vary-by-developer="false" vary-by-developer-groups="false" allow-private-response-caching="true" downstream-caching-type="none">
<vary-by-header>Accept</vary-by-header>
<vary-by-header>Accept-Charset</vary-by-header>
<vary-by-header>Authorization</vary-by-header>
</cache-lookup>
<retry condition="true" count="8" interval="1" first-fast-retry="false">
</retry>
</inbound>
<backend>
<limit-concurrency key="backend-server" max-count="1">
<forward-request timeout="60" />
</limit-concurrency>
</backend>
...
</policies>
Upvotes: 1
Views: 1198
Reputation: 7810
At the moment limit-concurrency works at a node level only. So this will allow only one call to backend only if you have Dev APIM instance. With Basic, Std, or Prem you have two nodes minimum (1 unit = 2 nodes), so such configuration will allow at most two parallel calls under load.
Now if you specify max-count=X internally at node level we set limit to X/node_count. Of course we can't set it to 0, so 1 is a minimum. Thus it's always at least node_count number of calls.
We'll be updating this policy in future to support across nodes synchronization.
Upvotes: 1