Mongo with java - find query with batchsize

Question

I am executing find query in mongodb using java on a collection with batchsize set to 500. My collection has 10,000 records but with batchsize set i get only 1-500 records. How do I get the next set of records?

Below is the code snippet

DBCursor cursor = collection.find(query).batchSize(batchSize);
while(cursor.hasNext()) {
    // write to file.
    DBObject obj = cursor.next();
    objectIdList.add(obj.get("_id"));
}

glytching · Accepted Answer

The DBCursor allows you to iterate over the set of documents which are deemed relevant to the query to passed into the find() method. It lazily fetches these documents from the underlying database in chunks of batchSize.

So, with the default batch size (101, IIRC) it will return the first 101 documents to your client and then as your client code iterates beyond the 101st document it will (behind the scenes) grab the next 101 documents and so on until whichever of the following occurs first:

All of the documents which are relevant to your query are returned i.e. the cursor is exhausted
Your client stops iterating

The same applies when you set an explicit batchSize so in your case when you set batchSize=500, the find() call returns a DBCursor which contains (at most) 500 documents and if there were more than 500 documents matching your query then as you iterate beyond the 500th document the MongoDB Java driver would (behind the scenes) fetch the next batch.

You stated ...

My collection has 10,000 records but with batchsize set i get only 1-500 records

... if you only get 500 documents then either you stopped iterating after 500 or only 500 documents were deemed relevant to your query.

You can see how many documents are relevant to your query by using the count() method. For example:

int count = collection.find(query).count();

You can also grab all of the documents relevant to your query in one go without using a DBCursor like this ...

List obj = collection.find(query).toArray();

... though of course this might have implications for your application's heap since it would result in every document which meets your criteria being stored on-heap in your client (rather than the more memory friendly approach of reading them in batches via the DBCursor).

Mongo with java - find query with batchsize

Below is the code snippet

Answers (2)

Related Questions