sfaust
sfaust

Reputation: 2373

AWS Lambda Force Closes Connection?

I have an API running on AWS Lambda through API Gateway and I'm getting a very small number of users that are suddenly getting requests kicked back as connection forcibly closed by remote host which is causing issues in the program. Nothing has changed on the server side in a month or so but these just started showing up. As mentioned, it's not showing up for everyone, just a small number of users, but based on the stack trace it seems to be server side unless I misinterpret it. Here is a sample error stack from a user (with time stamp as it's from a log file):

2020-03-17 10:12:25,521 (2020-03-17 14:12:25,521 UTC) [23] ERROR RDES.License.Core.Services.Implementation.LicenseValidatorApi - General exception from startup authentication, loop 0
RdRestConsumer.Exceptions.ApiGeneralException: Revolution Design API responded with status code:  :  ---> Flurl.Http.FlurlHttpException: Call failed. An error occurred while sending the request. POST https://*apiUrl*/auth/refresh ---> System.Net.Http.HttpRequestException: An error occurred while sending the request. ---> System.Net.WebException: The underlying connection was closed: An unexpected error occurred on a send. ---> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host
   at System.Net.Sockets.Socket.EndReceive(IAsyncResult asyncResult)
   at System.Net.Sockets.NetworkStream.EndRead(IAsyncResult asyncResult)
   --- End of inner exception stack trace ---
   at System.Net.TlsStream.EndWrite(IAsyncResult asyncResult)
   at System.Net.ConnectStream.WriteHeadersCallback(IAsyncResult ar)
   --- End of inner exception stack trace ---
   at System.Net.HttpWebRequest.EndGetRequestStream(IAsyncResult asyncResult, TransportContext& context)
   at System.Net.Http.HttpClientHandler.GetRequestStreamCallback(IAsyncResult ar)
   --- End of inner exception stack trace ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Flurl.Http.FlurlRequest.<SendAsync>d__19.MoveNext()
   --- End of inner exception stack trace ---
   at Flurl.Http.FlurlRequest.<HandleExceptionAsync>d__23.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Flurl.Http.FlurlRequest.<SendAsync>d__19.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at Flurl.Http.FlurlRequest.<SendAsync>d__19.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Flurl.Http.HttpResponseMessageExtensions.<ReceiveJson>d__0`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at RdRestConsumer.Services.Implementation.AuthService.<RefreshAccessTokenAsync>d__22.MoveNext()
   --- End of inner exception stack trace ---
   at RdRestConsumer.Services.Implementation.AuthService.<RefreshAccessTokenAsync>d__22.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at RdRestConsumer.Services.Implementation.AuthService.<AttemptStartupAuthenticationAsync>d__12.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at RDES.License.Core.Services.Implementation.LicenseValidatorApi.<AttemptStartupAuthenticationAsync>d__50.MoveNext()

Tests we have done and what we have ruled out:

  1. Timeout - Looking at the timing in the log files the time between request start and getting this error is between 1 and 2 seconds (roughly) and timeout for both Lambda and API Gateway is set to 30 seconds so there is not near enough time for a timeout.
  2. Memory - I read on a SO post (sorry I don't have the link anymore as I closed it) that closed connections could be caused by resource issues. I don't see where requests have hit a memory limit but I upped the memory from 512 to 768 anyway to test but it had no effect as far as I can tell.

Anyone know what would be causing this issue?

Upvotes: 1

Views: 1380

Answers (1)

JD D
JD D

Reputation: 8137

A few thoughts on this as I think the answer may be multi-faceted.

I have read that ensuring that the connections is TLSv1.2 can reduce the occurrence of this type of connection dropped issue.

You can enforce that TLS 1.2 is used on the server side by configuring your API Gateway to be used in conjunction with an API Gateway custom domain name or have the traffic routed through a Cloudfront distribution. Both Custom Domain Names and Cloudfront allow you to only enable TLSv1.2.

If you control the client code, you can also ensure that they are configured to use TLSv1.2.

On top of that, network blips always happen when dealing with network traffic and it's best practice for client code to have retry logic with an incremental backoff in place so that things are automatically retried.

We used a different backing technology stack but experienced similar issues when our client base grew and we started receiving more traffic. We controlled our client code so we added some retry logic in there which made our clients much more robust. Here an an AWS blog post about that is good reading on the topic: https://aws.amazon.com/builders-library/timeouts-retries-and-backoff-with-jitter/

Upvotes: 2

Related Questions