Reputation: 193
I have two app service endpoints with same weight (1) configured in the azure traffic manager. Some details for these two api apps:
Endpoint A: East US 2, App service plan is S2
Endpoint B: West US, App service plan is S1
Both of their scale out plans are same: min 4, max 7, default 5.
According to the documentation seems the weighted routing method used Round-robin method by default. As these two endpoints have the same weight, I am expecting they received nearly same amount of requests (the ratio is close to 1:1) when I did the load tests. But it is not. The results look fluctuated.
For example, if I started with a 1000 requests ramping up in 10 sec, # of requests that A received : # of requests that B received could be 3 : 1. And if I did a second same test, it could go the opposite way, which is B receiving much more requests than A. I tried to increase the request amount, sometimes I can get a 1:1 result, but this random behavior is not what we want.
How can we ensure that we can distribute the traffic evenly to these two endpoints when we used the weighted routing method in Azure traffic manager?
Upvotes: 0
Views: 370
Reputation: 911
As mentioned in the Azure Traffic manager weighted traffic-routing method document,
Using the same weight across all endpoints results in an even traffic distribution. However a point to remember is that DNS responses get cached by clients. They're also cached by the recursive DNS servers that the clients use to resolve DNS names. This caching can have an effect on weighted traffic distributions. When the number of clients and recursive DNS servers is large, traffic distribution works as expected. However, when the number of clients or recursive DNS servers is small, caching can significantly skew the traffic distribution.
You can also find that it is recommended to flush the DNS client cache while testing the weighted traffic routing method.
The results of the DNS lookup are cached for the duration of the DNS Time-to-live (TTL). The default TTL for Traffic Manager is 300 seconds.
The duration of the cache is determined by the 'time-to-live' (TTL) property of each DNS record. Shorter values result in faster cache expiry and thus more round-trips to the Traffic Manager name servers. Longer values mean that it can take longer to direct traffic away from a failed endpoint. Traffic Manager allows you to configure the TTL used in Traffic Manager DNS responses to be as low as 0 seconds and as high as 2,147,483,647 seconds, enabling you to choose the value that best balances the needs of your application.
A TTL of 0 means that downstream DNS resolvers don’t cache query responses and all queries are expected to reach the Traffic Manager DNS servers for resolution.
Reference:
My recommendation here is to reduce the TTL value of your Traffic manager profile to 5 seconds and test this again.
Upvotes: 0