isaac9A
isaac9A

Reputation: 903

mongo cursor timeout

I am trying to aggregate some records in a mongo database using the node driver. I am first matching to org, fed, and sl fields (these are indexed). If I only include a few companies in the array that I am matching the org field to, the query runs fine and works as expected. However, when including all of the clients in the array, I always get:

MongoError: getMore: cursor didn't exist on server, possible restart or timeout?

I have tried playing with the allowDiskUse, and the batchSize settings, but nothing seems to work. With all the client strings in the array, the aggregation runs for ~5hours before throwing the cursor error. Any ideas? Below is the pipeline along with the actual aggregate command.

setting up the aggregation pipeline:

var aggQuery = [

{
  $match: { //all clients, from last three days, and scored
    org: 
      { $in : array } //this is the array I am talking about
    ,
    frd: {
      $gte: _.last(util.lastXDates(3))
    },
    sl : true
  }
}

, {
  $group: { //group by isp and make fields for calculation
    _id: "$gog",
    count: {
      $sum: 1
    },
    countRisky: {
      $sum: {
        $cond: {
          if :{
            $gte: ["$scr", 65]
          },
          then: 1,
          else :0
        }
      }
    },
    countTimeZoneRisky: {
      $sum: {
        $cond: {
          if :{
            $eq: ["$gmt", "$gtz"]
          },
          then: 0,
          else :1
        }
      }
    }
  }
}

, {
  $match: { //show records with count >= 500
    count: {
      $gte: 500
    }
  }
}
, {
  $project: { //rename _id to isp, only show relevent fields
    _id: 0,
    ISP: "$_id",
    percentRisky: {
      $multiply: [{
          $divide: ["$countRisky", "$count"]
        },
        100
      ]
    },
    percentTimeZoneDiscrancy: {
      $multiply: [{
          $divide: ["$countTimeZoneRisky", "$count"]
        },
        100
      ]
    },
    count: 1
  }
}

, {
  $sort: { //sort by percent risky and then by count
    percentRisky: 1,
    count: 1
  }
}

];

Running the aggregation:

  var cursor = reportingCollections.hitColl.aggregate(aggQuery, {
    allowDiskUse: true,
    cursor: {
      batchSize: 40000
    }
  });

  console.log('Writing data to csv ' + currentFileNamePrefix + '!');

  //iterate through cursor and write documents to CSV
  cursor.each(function (err, document) {
    //write each document to csv file
    //maybe start a nuclear war
  });

Upvotes: 1

Views: 4184

Answers (2)

pkopac
pkopac

Reputation: 1035

To circumvent such problems, I'd recommend aggregation or map-reduce directly through mongo client. There you can add the notimeout option.

The default timeout is 10 minutes (obviously useless for long time-consuming queries) and there's no way currently to set a different one as far as I know, only infinite by aforementioned option. The timeout hits you especially for high batch sizes, because it will take more than 10 mins to process the incoming docs and before you ask mongo server for more, the cursor has been deleted.

IDK your use case, but if it's a web view, there should be only fast queries/aggregations.

BTW I think this didn't change with 3.0.*

Upvotes: 0

Christian P
Christian P

Reputation: 12240

You're calling the aggregate method which doesn't return the cursor by default (like e.g. find()). To return query as a cursor, you must add the cursor option in the options. But, the timeout setting for the aggregation cursor is (currently) not supported. The native node.js driver only supports the batchSize setting.

You would set the batchOption like this:

var cursor = coll.aggregate(query, {cursor: {batchSize:100}}, writeResultsToCsv);

Upvotes: 4

Related Questions