Reputation: 71
We are building a platform which will allow to invoke an on-premise API from the cloud, for this purpose we are using WCF relays which in fact is the appropriate service for us since we need to create the relays dynamically (we have an onboarding API which is responsible for validating the customer license and creates the relay in Azure in case of a valid license to make possible the communication between the cloud API and the on-premise API).
At this moment our QA team is working on the load tests in order to know how much traffic supports the platform, during the tests our QA team detected a weird behavior when they run a test with 50 concurrent requests from JMeter.
Basically, in this scenario (50 concurrent requests), some of the requests fail because of 502 Bad Gateway responses, these responses come from the relay since the content type is XML.
The 502 Bad Gateway response is telling that the listener didn't accept the connection within the allowed interval, but the weird thing is that once we receive the 502 Bad Gateway response, no more requests reach the listener (the on-premise API), the only way to make the communication works again is to close the listener and start it again.
As far as I know, WCF relays support load balancing using a random strategy to choose the listener responsible for processing the request, I have found a thread in the Github repo belonging to the WCF relay service where is described the load balancing algorithm as follows:
If the algorithm works in the way described above, then the default behaviour is very sensitive to DoS attacks, since once the rendezvous attempt fails, the listener will be removed from the list of listeners to try, this is a very bad idea since the only way to make the communication works again is to reconnect the listener, in our case, this means a manual action by the user since we are hosting the WCF service in a windows service (the user should restart the windows service in case of failed rendezvous attempt).
The funny thing is that we are applying the "ConnectionStatusBehaviour" to the endpoint to log whatever connection issue and we don't see anything strange in the logs, apparently, the service keeps connected/online and in the azure monitor the listener keep connected as well.
There is some way to configure a different behaviour when the rendezvous attempt doesn't succeed?.
Our current WCF configuration is as follows:
Actual Binding Configuration (WebHttpRelayBinding)
Actual ServiceThrottlingBehaviour
Actual Wcf Service Behaviour
Client-side
The WCF relay is invoked from an AspNetCore 2.2 application using HttpClient since the relay address is discovered at runtime (we cannot use a WCF proxy in this case).
The AspNetCore application is hosted in Azure.
Listener
Deployed on a Virtual Machine in Azure The system connectivity mode is configured with the value ConnectivityMode.Https
Upvotes: 0
Views: 184