MongoDB Node.js each method

Question

I have an array of data which I'll store in the database. When I'm looking if the data already exists, each() will called twice, even when I'm using limit(1). I have no clue whats going on here...

collection.find({
    month: 'april'
}).limit(1).count(function(err, result){
    console.log('counter', result);
});

collection.find({
    month: 'april'
}).limit(1).each(function(err, result){
    console.log('each', result);
});

collection.find({
    month: 'april'
}).limit(1).toArray(function(err, result){
    console.log('toArray', result);
});

At this time, there is exact 1 dataset of month April already stored in the collection. The above queries will generate an output like this:

count 1
each {...}
each null
toArray {...}

In the mongo shell I have checked the count() and forEach() methods. Everything works as expected. Is it a driver problem? Am I doing anything wrong?

Gergo Erdosi · Accepted Answer

This is the expected behavior. The driver returns the items in the loop, and then at the end it returns null to indicate that there are no items left. You can see this in the driver's examples too:

// Find returns a Cursor, which is Enumerable. You can iterate:
collection.find().each(function(err, item) {
  if(item != null) console.dir(item);
});

If you are interested in the details, you can check the source code for each:

if(this.items.length > 0) {
  // Trampoline all the entries
  while(fn = loop(self, callback)) fn(self, callback);
  // Call each again
  self.each(callback);
} else {
  self.nextObject(function(err, item) {

    if(err) {
      self.state = Cursor.CLOSED;
      return callback(utils.toError(err), item);
    }

>>  if(item == null) return callback(null, null);  <<
    callback(null, item);
    self.each(callback);
  })
}

In this code each iterates through the items using loop which shifts items from the array (var doc = self.items.shift();). When this.items.length becomes 0, the else block is executed. This else block tries to get the next document from the cursor. If there are no more documents, nextObject returns null (item's value becomes null) which makes if(item == null) return callback(null, null); to be executed. As you can see the callback is called with null, and this is the null that you can see in the console.

This is needed because MongoDB returns the matching documents using a cursor. If you have millions of documents in the collection and you run find(), not all documents are returned immediately because you would run out of memory. Instead MongoDB iterates through the items using a cursor. "For most queries, the first batch returns 101 documents or just enough documents to exceed 1 megabyte." So this.items.length becomes the number of the items that are in the first batch, but that's not necessarily the total number of the documents resulted by the query. That's why when you iterate through the documents and this.items.length becomes 0, MongoDB uses the cursor to check if there are more matching documents. If there are, it loads the next batch, otherwise it returns null.

It's easier to understand this if you use a large limit. For example in case of limit(100000) you would need a lot of memory if MongoDB returned all 100000 documents immediately. Not to mention how slow processing would be. Instead, MongoDB returns results in batches. Let's say the first batch contains 101 documents. Then this.items.length becomes 101, but that's only the size of the first batch, not the total number of the result. When you iterate through the results and you reach the next item after the last one that is in the current batch (102nd in this case), MongoDB uses the cursor to check if there are more matching documents. If there are, the next batch of documents are loaded, null otherwise.

But you don't have to bother with nextObject() in you code, you only need to check for null as in the MongoDB example.

MongoDB Node.js each method

Answers (1)

Related Questions