user4408375
user4408375

Reputation:

Java/Grails - MongoDB aggregation 16MB buffer size limit

I am trying to run mongo db aggregate query from java, but buffer size is exceeding 16MB. Is there any way to adjust the buffer size or any other workaround. I do not have the option to create collection in mongo server side and also I do not have any mongo utility like mongo.exe or mongoExport.exe in my client system.

Here is little part of code

if (!datasetObject?.isFlat && jsonFor != 'collection-grid'){
   //mongoPipeline = new AggregateArgs (Pipeline = pipeline, AllowDiskUse = true, OutputMode = AggregateOutputMode.Cursor)
   output= dataSetCollection.aggregate(pipeline)
}else{
     output= dataSetCollection.aggregate(project)
    }

I have 100K records with 30 field. When I query for 5 fields for all 100K records I get result(Success). But when I make a query for 100K records with all fields its throwing below error.

Issue is when I am trying to access all documents from collection including all fields of document its exceeding 16Mb limit size.

Actual Error:

com.mongodb.CommandFailureException: { "serverUsed" : "127.0.0.1:27017" , "errmsg" : "exception: aggregation result exceeds maximum document size (16MB)" , "code" : 16389 , "ok" : 0.0

How to resolve this issue?

Using MongoDB-3.0.6

Note: GridFS is not suitable for my criteria. Because I need to retrieve all documents in one request not one document.

Upvotes: 1

Views: 1904

Answers (2)

user4408375
user4408375

Reputation:

There are two options to resolve this issue

1) use of $out which creates new collection and write result, Which is not good idea because this process is time consuming and complex to implement.

public class JavaAggregation {
public static void main(String args[]) throws UnknownHostException {

    MongoClient mongo = new MongoClient();
    DB db = mongo.getDB("databaseName");

    DBCollection coll = db.getCollection("dataset");

    /*
        MONGO SHELL : 
        db.dataset.aggregate([ 
            { "$match": { isFlat : true } }, 
            { "$out": "datasetTemp" }
        ])
    */

    DBObject match = new BasicDBObject("$match", new BasicDBObject("isFlat", true)); 
    DBObject out = new BasicDBObject("$out", "datasetTemp"); 

    AggregationOutput output = coll.aggregate(match, out);

    DBCollection tempColl = db.getCollection("datasetTemp");
    DBCursor cursor = tempColl.find();

    try {
        while(cursor.hasNext()) {
            System.out.println(cursor.next());
        }
    } finally {
        cursor.close();
    }
 }
}

2. Use of allowDiskUse(true) is very simple to implement and not even time consuming.

public class JavaAggregation {
public static void main(String args[]) throws UnknownHostException {

    MongoClient mongo = new MongoClient();
    DB db = mongo.getDB("databaseName");

    DBCollection coll = db.getCollection("dataset");

    /*
        MONGO SHELL : 
        db.dataset.aggregate([ 
            { "$match": { isFlat : true } }, 
            { "$out": "datasetTemp" }
        ])
    */

    DBObject match = new BasicDBObject("$match", new BasicDBObject("isFlat", true)); 
    def dbObjArray = new BasicDBObject[1]
    dbObjArray[0]= match
    List<DBObject> flatPipeline = Arrays.asList(dbObjArray)

    AggregationOptions aggregationOptions = AggregationOptions.builder()
                                    .batchSize(100)
                                    .outputMode(AggregationOptions.OutputMode.CURSOR)
                                    .allowDiskUse(true)
                                    .build();
    def cursor = dataSetCollection.aggregate(flatPipeline,aggregationOptions)
    try {
        while(cursor.hasNext()) {
            System.out.println(cursor.next());
        }
    } 
    finally {
        cursor.close();
    }
}

For more see here and here

Upvotes: 0

evanchooly
evanchooly

Reputation: 6233

When running the aggregation you can tell mongo to return a cursor. With the new APIs in the 3.0 Java driver that would look like this:

// Assuming MongoCollection
dataSetCollection.aggregate(pipeline).useCursor(true)

You might also need to tell it to use disk space on the server rather than doing it all in memory:

// Assuming MongoCollection
dataSetCollection.aggregate(pipeline).useCursor(true).allowDiskUse(true)

If you're using an older driver (or the old API in the new driver) those two options would look like this:

// Assuming DBCollection
dataSetCollection.aggregate(pipeline, AggregationOptions.builder()
    .allowDiskUse(true)
        .useCursor(true)
        .build())
    .useCursor(true)

Upvotes: 1

Related Questions