Reputation: 2348
We have the following architecture for our microservices backend.
< Nginx> -----> <Facade Layer(built in JAVA/Springboot> -------Load Balancer(HAProxy) --- <Service Layer(built in JAVA/Springboot)>
Traffic comes to Nginx, proxy passed to Facade and facade calls the services(via Load Balancer). We are not using Service Discovery. It's a static mapping of IPs Facades at Nginx/Ips of Services at HAProxy.
Now I want to use Rate Limitor/Circuit breaker. At what point in the architecture should we do this? I mean should we add one more hop or anything else?
We are planning to use resilience4j for this
Upvotes: 0
Views: 3454
Reputation: 564
Rate Limiter and Circuit breakers are for 2 different use cases.
Rate Limiters are pretty dumb (they can be complex, but generally dumb). There is a threshold, anything above the threshold is limited. Thresholds are decided based on the capacity of the underlying service or based on your application requirements (like SLA: say 1 user makes 5 API calls max per minute). Sometimes even to thwart DoS.
Circuit breakers are more of resiliency patterns and more intelligent than Rate limiters. They ensure failure to one component of the system does not bring down the entire system by backing off for some time, assuming the backoff interval would suffice for the failure to heal/recover. When the 3rd party does not respond, you trip open the circuit on some percentage of failures and keep trying after some backoff interval. You close the circuit when 3rd party starts to respond again. Helps your system to be responsive and not hog resources when no work is being done downstream.
At what level do you need to have them - As usual, that depends. Generally, Rate limiters are your first line of defense to DDoS, and are implemented at the load balancer/Reverse proxy level. More application-aware thresholds can be placed at your facade layer. Try keeping them abstracted from the application.
Circuit breakers - use then when you making an unreliable call downstream, when the downstream can either take more time than you intend or you want your service to function for some time regardless of the 3rd party's availability. You can also use this to make your application responsive at the expense of the result of 3rd party. For example - if your downstream did not respond within 100ms with live results, you trip your circuit at 100ms and show the use older cached/default results.
Upvotes: 7