monkey
monkey

Reputation: 1597

Very large Promise all array showing periodic failure invoked from Lambda

I have written a piece of code in lambda to query a database a large number of times. To speed things along, I use a promise array and await Promise.all(promises) ... The thing is, I seem to be eperiencing a seemingly random failure pushing promises onto the array. It's not repeatable, which makes it look like a resource allocation problem in AWS... But I've changed the dynamoDB allocation to on Demand to account for that. But of just over 1000 promises, about 10 seem to fail. When I do about 12,000 promises in my array, similarly, about 1% seem to fail. Which is a giant pain.

My code looks like this:

let month_index = 0;

console.log('promises[] = ' + scanResults.length + ' devices x ' + monthsToReport + ' months = ' + scanResults.length * monthsToReport + 'promises!');

do {
  var monStart = new Date(year, (month-1)); // July 2019.
  var ts_monStart = Math.round((monStart).getTime() / 1000);

  if (month === 11) {
    month = 0;
    year++
  } else {
    month++;
  }

  var monEnd   = new Date(year, (month-1));
  var ts_monEnd   = Math.round((monEnd).getTime() / 1000);

  console.log("start month = " + ts_monStart + ", end month = " + ts_monEnd);


  // Fill up Array of Promises: one for each devices data. the do / while loop repeats for each month.


  var promises = [];

  //     for (let i = 0; i < 500; i++) {
  for (let i = 0; i < scanResults.length; i++) {
    // console.log('i = ', i);

    try {
      // Define the Query Params.
      var dataParams = {
        ExpressionAttributeValues: {
          ":dynamoId": {S: scanResults[i].deviceId},
          ":f" : {N: ts_monStart.toString()},   // fromTime
          ":t" : {N: ts_monEnd.toString()}      // toTime
        },
        KeyConditionExpression: "id = :dynamoId and ts between :f AND :t",
        TableName: "SensorNodeData"
      };

      let month_index_d = month_index; // need let within scope for promises to resolve properly. Don't ask me why?! I don't care.

      // Add a promise to promise array! Callback grabs Count.

      if(scanResults[i].deviceId == "seq" ) {
        console.log('balls! ignoring this one.' +  scanResults[i]);
        scanResults[i].month[month_index_d] = 0;
        continue;
      }



      promises.push(dynamodb.query(dataParams, function(err, data) {
        if (err) {
          console.log("Error Pushing Promise: ", err);
          //        console.log('i = ' + i + ', device = ' + JSON.stringify(scanResults[i]) + ', params = ' + JSON.stringify(dataParams));
        } else {
          scanResults[i].month[month_index_d] = data.Count;
          //             console.log('Device ' + i + ' (' + scanResults[i].deviceId + ')' + ', month_index = ' + month_index_d + ', Count = ' + scanResults[i].month[month_index_d]);
        }
      }).promise());
    } catch(err) {
      console.log('promise ballsed up: ', err);
    }
  }

  // Using promise.all has reduced execution time by factor of how many devices * months! *much* faster.
  try {
    await Promise.all(promises).then( () => {
      console.log('Promises Done!');
    });
  } catch (err) {
    console.log('promise all ballsed up: ', err);
  }


  month_index++;

} while (monEnd < ts_today);

However, in Cloudwatch, I can see the following problem about 1% of the time, and not always for the same promises. Very odd and most annoying! I'm pretty rubbish at node, so I may have done something very wrong. If so, imagine you are explaining the fault to very dumb person. That will maximise of the exercise the benefit for all involved (particularly me).

INFO    Error Pushing Promise:  { TypeError: Cannot read property 'push' of undefined
    at Request.HTTP_DATA (/var/runtime/node_modules/aws-sdk/lib/event_listeners.js:389:35)
    at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
    at Request.emit (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
    at Request.emit (/var/runtime/node_modules/aws-sdk/lib/request.js:683:14)
    at IncomingMessage.onReadable (/var/runtime/node_modules/aws-sdk/lib/event_listeners.js:289:32)
    at IncomingMessage.emit (events.js:198:13)
    at IncomingMessage.EventEmitter.emit (domain.js:448:20)
    at emitReadable_ (_stream_readable.js:555:12)
    at process._tickCallback (internal/process/next_tick.js:63:19)
  message: 'Cannot read property \'push\' of undefined',
  code: 'TypeError',
  time: 2020-03-29T03:27:28.866Z,

Upvotes: 2

Views: 493

Answers (1)

msbit
msbit

Reputation: 4320

If you take a look at the code where the error is encountered (node_modules/aws-sdk/lib/event_listeners.js:389:35), you'll see that this occurs when the lib is trying to read data from the HTTP connection (presumably to DynamoDB):

379     add('HTTP_DATA', 'httpData', function HTTP_DATA(chunk, resp) {
380       if (chunk) {
381         if (AWS.util.isNode()) {
382           resp.httpResponse.numBytes += chunk.length;
383 
384           var total = resp.httpResponse.headers['content-length'];
385           var progress = { loaded: resp.httpResponse.numBytes, total: total };
386           resp.request.emit('httpDownloadProgress', [progress, resp]);
387         }
388 
389         resp.httpResponse.buffers.push(AWS.util.buffer.toBuffer(chunk)); <=== HERE
390       }
391     });

The push referred to is the attempted push onto resp.httpResponse.buffers, not pushing onto your promises array as you might have thought.

My feeling is that making ~12,000 simultaneous connections to DynamoDB is causing some of them to drop off. You've mentioned that you've updated the DynamoDB allocation to on-demand, which would address this eventually, but it's possible that this sudden demand catches the autoscaler off-guard.

I don't know enough about DynamoDB to talk specifics, but if you wanted to troubleshoot this, you could temporarily allocate the DynamoDB resources slightly above the peak automatic resource allocation and try the same request. If the number of failed queries is reduced, then this seems a likely culprit.

Upvotes: 2

Related Questions