Reputation: 35
In order to iterate over all the documents in a MongoDB (2.6.9) collection using Grails (2.5.0) and the MongoDB Plugin (3.0.2) I created a forEach like this:
class MyObjectService {
def forEach(Closure func) {
def criteria = MyObject.createCriteria()
def ids = criteria.list { projections { id() } }
ids.each { func(MyObject.get(it)) }
}
}
Then I do this:
class AnalysisService{
def myObjectService
@Async
def analyze(){
MyObject.withStatelessSession {
myObjectService.forEach { myObject ->
doSomethingAwesome(myObject)
}
}
}
}
This works great...until I hit a collection that is large (>500K documents) at which point a CommandFailureException is thrown because the size of the aggregation result is greater than 16MB.
Caused by CommandFailureException: { "serverUsed" : "foo.bar.com:27017" , "errmsg" : "exception: aggregation result exceeds maximum document size (16MB)" , "code" : 16389 , "ok" : 0.0}
In reading about this, I think that one way to handle this situation is to use the option allowDiskUse
in the aggregation function that runs on the MongoDB side so that the 16MB memory limit won't apply and I can get a larger aggregation result.
How can I pass this option to my criteria query? I've been reading the docs and the Javadoc for the Grails MongoDB plugin, but I can't seem to find it. Is there is another way to approach the generic problem (iterate over all members of a large collection of domain objects)?
Upvotes: 2
Views: 578
Reputation: 25797
This is not possible with the current implementation of MongoDB Grails plugin. https://github.com/grails/grails-data-mapping/blob/master/grails-datastore-gorm-mongodb/src/main/groovy/org/grails/datastore/mapping/mongo/query/MongoQuery.java#L957
If you look at the above line, then you will see that the default options are being used for building AggregationOptions instance so there is no method to provide an option.
But there is another hackish way to do it using the Groovy's metaclass. Let's do it..:-)
Store the original method reference of builder()
method before writing criteria in your service:
MetaMethod originalMethod = AggregationOptions.metaClass.static.getMetaMethod("builder", [] as Class[])
Then, replace the builder method to provide your implementation.
AggregationOptions.metaClass.static.builder = { ->
def builderInstance = new AggregationOptions.Builder()
builderInstance.allowDiskUse(true) // solution to your problem
return builderInstance
}
Now, your service method will be called with criteria query and should not results in the aggregation error you are getting since we have not set the allowDiskUse
property to true.
Now, reset the original method back so that it should not affect any other call (optional).
AggregationOptions.metaClass.static.addMetaMethod(originalMethod)
Hope this helps!
Apart from this, why do you pulling all IDs in forEach
method and then re getting the instance using get()
method? You are wasting the database queries which will impact the performance. Also, if you follow this, you don't have to do the above changes.
An example with the same: (UPDATED)
class MyObjectService {
void forEach(Closure func) {
List<MyObject> instanceList = MyObject.createCriteria().list {
// Your criteria code
eq("status", "ACTIVE") // an example
}
// Don't do any of this
// println(instanceList)
// println(instanceList.size())
// *** explained below
instanceList.each { myObjectInstance ->
func(myObjectInstance)
}
}
}
(I'm not adding the code of AnalysisService
since there is no change)
*** The main point is here at this point. So whenever you write any criteria in domain class (without projection and in mongo), after executing the criteria code, Grails/gmongo will not immediately fetch the records from the database unless you call some methods like toString()
, 'size()or
dump()` on them.
Now when you apply each
on that instance list, you will not actually loading all instances into memory but instead you are iterating over Mongo Cursor behind the scene and in MongoDB, cursor uses batches to pull record from database which is extremely memory safe. So you are safe to directly call each on your criteria result which will not blow up the JVM unless you called any of the methods which triggers loading all records from the database.
You can confirm this behaviour even in the code: https://github.com/grails/grails-data-mapping/blob/master/grails-datastore-gorm-mongodb/src/main/groovy/org/grails/datastore/mapping/mongo/query/MongoQuery.java#L1775
Whenever you write any criteria without projection, you will get an instance of MongoResultList
and there is a method named initializeFully()
which is being called on toString()
and other methods. But, you can see the MongoResultList
is implementing iterator which is in turn calling MongoDB cursor method for iterating over the large collection which is again, memory safe.
Hope this helps!
Upvotes: 0