Reputation: 719
We seem to be deterministically encountering this problem and aren't sure where we're misconfigured. For lambdas running less than ~5 minutes, our invocation succesfully wraps up ~0.5 seconds after the lambda completes. However for anything running longer than that, we can see that the lambda completes in the lambda logs, but our client invocation throws a ClientExecutionTimeoutException
after 15 minutes.
After encountering the problem with other (otherwise successful) lambdas, we created a basic test lambda on Node with a sleep function and have been able to deterministically reproduce the issue:
function sleep(s) {
return new Promise(resolve => setTimeout(resolve, s * 1000));
}
const sleepMinutes = 60 * 5;
exports.handler = async (event) => {
console.log(`received lambda invocation, sleeping ${sleepMinutes}`);
const response = {
statusCode: 200,
body: JSON.stringify(`finished running, slept for ${sleepMinutes} minutes`),
};
await sleep(sleepMinutes);
console.log('finished sleeping');
return response;
};
Our lambda invocation client is using these client configs:
clientConfig.setRetryPolicy(PredefinedRetryPolicies.NO_RETRY_POLICY);
clientConfig.setMaxErrorRetry(0);
clientConfig.setSocketTimeout(15 * 60 * 1000);
clientConfig.setRequestTimeout(15 * 60 * 1000);
clientConfig.setClientExecutionTimeout(15 * 60 * 1000);
Is there a ~5 minute timeout config we're missing?
Upvotes: 7
Views: 1637
Reputation: 719
I've accepted Ezequiel's answer since it was technically a networking / OS issue, but here is a more detailed result:
We had to ensure all relevant clients were configured to keep alive tcp connections. We then had to add these properties to the /etc/sysctl.conf
file on our EC2 residing in a private subnet because the NAT gateway is set to kill idle connections beyond 350s:
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_intvl = 100
net.ipv4.tcp_keepalive_probes = 6
Upvotes: 3
Reputation: 69
I have experienced such errors. The problem is with the lambda context. Your function might not return the success but return fail, as you didn't finish the function with context success. Please check if you have done this when you finish the lamba.
Thanks.
Upvotes: 1
Reputation: 3592
Javadocs in aws-sdk-java says:
For functions with a long timeout, your client might be disconnected during synchronous invocation while it waits for a response. Configure your HTTP client, SDK, firewall, proxy, or operating system to allow for long connections with timeout or keep-alive settings.
On the other hand, previously AWS Lambda was limited up to 5 minutes, later this limit was increased up to 15 minutes.
I would check:
AWSLambdaAsyncClient.invokeAsync()
for long running invocations. Upvotes: 7