Intermittent DynamoDB DAX errors: NoRouteException during cluster refresh

Question

Via CloudFormation, I have a setup including DynamoDB tables, DAX, VPC, Lambdas (living in VPC), Security Groups (allowing access to port 8111), and so on.

Everything works, except when it doesn't.

I can access DAX from my VPC'd Lambdas 99% of the time. Except occasionally they get NoRouteException errors... seemingly randomly. Here's the output from CloudWatch for a single Lambda function doing the exact same thing each time (a DAX get). Notice how it works, fails, and then works again:

/aws/lambda/BigOnion_accountGet START RequestId: 2b732899-f380-11e7-a650-cbfe0f7dfb3d Version: $LATEST
/aws/lambda/BigOnion_accountGet END RequestId: 2b732899-f380-11e7-a650-cbfe0f7dfb3d
/aws/lambda/BigOnion_accountGet REPORT RequestId: 2b732899-f380-11e7-a650-cbfe0f7dfb3d  Duration: 58.24 ms  Billed Duration: 100 ms     Memory Size: 768 MB Max Memory Used: 48 MB
/aws/lambda/BigOnion_accountGet START RequestId: 3b63a928-f380-11e7-a116-5bb37bb69bee Version: $LATEST
/aws/lambda/BigOnion_accountGet END RequestId: 3b63a928-f380-11e7-a116-5bb37bb69bee
/aws/lambda/BigOnion_accountGet REPORT RequestId: 3b63a928-f380-11e7-a116-5bb37bb69bee  Duration: 35.01 ms  Billed Duration: 100 ms     Memory Size: 768 MB Max Memory Used: 48 MB
/aws/lambda/BigOnion_accountGet START RequestId: 4b7fa7f2-f380-11e7-a0c8-513a66a11e7a Version: $LATEST
/aws/lambda/BigOnion_accountGet 2018-01-07T07:56:40.643Z    3b63a928-f380-11e7-a116-5bb37bb69bee    caught exception during cluster refresh: { Error: NoRouteException: not able to resolve address
    at DaxClientError (/var/task/index.js:545:5)
    at AutoconfSource._resolveAddr (/var/task/index.js:18400:23)
    at _pull (/var/task/index.js:18421:20)
    at _pullFrom.then.catch (/var/task/index.js:18462:18)
  time: 1515311800643,
  code: 'NoRouteException',
  retryable: true,
  requestId: null,
  statusCode: -1,
  _tubeInvalid: false,
  waitForRecoveryBeforeRetrying: false }
/aws/lambda/BigOnion_accountGet 2018-01-07T07:56:40.682Z    3b63a928-f380-11e7-a116-5bb37bb69bee    Error: NoRouteException: not able to resolve address
    at DaxClientError (/var/task/index.js:545:5)
    at AutoconfSource._resolveAddr (/var/task/index.js:18400:23)
    at _pull (/var/task/index.js:18421:20)
    at _pullFrom.then.catch (/var/task/index.js:18462:18)
/aws/lambda/BigOnion_accountGet END RequestId: 4b7fa7f2-f380-11e7-a0c8-513a66a11e7a
/aws/lambda/BigOnion_accountGet REPORT RequestId: 4b7fa7f2-f380-11e7-a0c8-513a66a11e7a  Duration: 121.24 ms Billed Duration: 200 ms     Memory Size: 768 MB Max Memory Used: 48 MB
/aws/lambda/BigOnion_accountGet START RequestId: 5b951673-f380-11e7-9818-f1effc29edd5 Version: $LATEST
/aws/lambda/BigOnion_accountGet END RequestId: 5b951673-f380-11e7-9818-f1effc29edd5
/aws/lambda/BigOnion_accountGet REPORT RequestId: 5b951673-f380-11e7-9818-f1effc29edd5  Duration: 39.42 ms  Billed Duration: 100 ms     Memory Size: 768 MB Max Memory Used: 48 MB
/aws/lambda/BigOnion_siteCreate START RequestId: 0ec60080-f380-11e7-afea-a95d25c6e53f Version: $LATEST
/aws/lambda/BigOnion_siteCreate END RequestId: 0ec60080-f380-11e7-afea-a95d25c6e53f
/aws/lambda/BigOnion_siteCreate REPORT RequestId: 0ec60080-f380-11e7-afea-a95d25c6e53f  Duration: 3.48 ms   Billed Duration: 100 ms     Memory Size: 768 MB Max Memory Used: 48 MB

Any ideas what it could be?

It's presumably not the VPC and security access as 9/10 times access is perfectly fine. I have a wide range of CIDR IPs, so I don't think it's anything related to EIN provisioning... but what else?

The only hint I have is the initial error which states "caught exception during cluster refresh". What exactly is a "cluster refresh" and how could it lead to these failures?

Intermittent DynamoDB DAX errors: NoRouteException during cluster refresh

Answers (1)

Related Questions