dev_feed
dev_feed

Reputation: 719

Why is aws lambda invocation client incorrectly returning ClientExecutionTimeoutException?

We seem to be deterministically encountering this problem and aren't sure where we're misconfigured. For lambdas running less than ~5 minutes, our invocation succesfully wraps up ~0.5 seconds after the lambda completes. However for anything running longer than that, we can see that the lambda completes in the lambda logs, but our client invocation throws a ClientExecutionTimeoutException after 15 minutes.

After encountering the problem with other (otherwise successful) lambdas, we created a basic test lambda on Node with a sleep function and have been able to deterministically reproduce the issue:

function sleep(s) {
  return new Promise(resolve => setTimeout(resolve, s * 1000));
}
const sleepMinutes = 60 * 5;
exports.handler = async (event) => {
    console.log(`received lambda invocation, sleeping ${sleepMinutes}`);
    const response = {
        statusCode: 200,
        body: JSON.stringify(`finished running, slept for ${sleepMinutes} minutes`),
    };
    await sleep(sleepMinutes);
    console.log('finished sleeping');
    return response;
};

Our lambda invocation client is using these client configs:

clientConfig.setRetryPolicy(PredefinedRetryPolicies.NO_RETRY_POLICY);
clientConfig.setMaxErrorRetry(0);
clientConfig.setSocketTimeout(15 * 60 * 1000);
clientConfig.setRequestTimeout(15 * 60 * 1000);
clientConfig.setClientExecutionTimeout(15 * 60 * 1000);

Is there a ~5 minute timeout config we're missing?

Upvotes: 7

Views: 1637

Answers (3)

dev_feed
dev_feed

Reputation: 719

I've accepted Ezequiel's answer since it was technically a networking / OS issue, but here is a more detailed result:

We had to ensure all relevant clients were configured to keep alive tcp connections. We then had to add these properties to the /etc/sysctl.conf file on our EC2 residing in a private subnet because the NAT gateway is set to kill idle connections beyond 350s:

net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_intvl = 100
net.ipv4.tcp_keepalive_probes = 6

Upvotes: 3

hongdeshuai
hongdeshuai

Reputation: 69

I have experienced such errors. The problem is with the lambda context. Your function might not return the success but return fail, as you didn't finish the function with context success. Please check if you have done this when you finish the lamba.

Thanks.

Upvotes: 1

Ezequiel
Ezequiel

Reputation: 3592

Javadocs in aws-sdk-java says:

 For functions with a long timeout, your client might be disconnected during synchronous invocation while it waits for a response. Configure your HTTP client, SDK, firewall, proxy, or operating system to allow for long connections with timeout or keep-alive settings.

On the other hand, previously AWS Lambda was limited up to 5 minutes, later this limit was increased up to 15 minutes.

I would check:

  1. The client sdk version is up to date
  2. The connection is not closed by your network
  3. Move to an async invocation via AWSLambdaAsyncClient.invokeAsync() for long running invocations.

Upvotes: 7

Related Questions