James Allman
James Allman

Reputation: 41188

Grails batch read optimization

I need to re-index all of my domain instances from a live legacy database on a regular basis using a custom ElasticSearch implementation.

All of the suggestions I have found so far have all been for optimizing batch write operations.

Many of the tables have hundreds of thousands if not millions of records and the simple solution of domain.list().each { ... } appears to load the entire dataset into memory at once.

Using Hibernate ScrollableResults seems like a good solution but it works at the RecordSet level and doesn't recreate the actual domain objects. I could use the id field to read() the instance but that seems inefficient.

The Searchable plugin seems to perform an efficient reindex operation.

What other options or optimizations are there for batch reading domain instances successfully?


UPDATE

I recently stumbled across a 4 year old thread on the Grails mailing list with an answer from @BurtBeckwith that suggests using a ScrollableResultSet would be inefficient. He recommends paging through the records using standard GORM methods and clearing the session in batches.

Upvotes: 0

Views: 1514

Answers (2)

Sudhir N
Sudhir N

Reputation: 4096

You can use a utility class similar to below to manual load records in batches

 class Paginator {
    private static final Logger log = Logger.getLogger("grails.app.Paginator")

    int batchSize
    int totalCount

    public void eachPage(Closure c) {
        SessionFactory sessionFactory = Holders.applicationContext.sessionFactory

        if(totalCount > 0) {
            (0..totalCount -1).step(batchSize) { int offset ->
                log.debug "Executing batch $offset to ${offset + batchSize}"
                try {
                    c.call(batchSize, offset)
                    sessionFactory.currentSession.clear()
                }catch(Exception e) {
                    log.error "Error encountered while executing batch [$offset - ${batchSize + offset}]"
                }
            }
        }
    }
}

And then

Paginator p = new Paginator(batchSize:500, totalCount: Domain.count())
p.eachPage {def max, def offset ->
    List rows = Domain.list(max:max, offset:offset)
    //do some thing with records
}

You can use the paginator with gorm or direct sql or anything.

Upvotes: 0

Jarred Olson
Jarred Olson

Reputation: 3243

1) Using a tradition Java for loop will save you a little bit of overhead.

2) Chunking the data might help. You can use max: and offset: to get chunks of 100 or 1000 that way you're always dealing with a smaller set: http://grails.org/doc/latest/ref/Domain%20Classes/list.html There might be issues with this though if you're modifying the objects in that table or if new data gets added in the middle of a workflow. (I'll leave that for you to test :)

3) Using SQL or HSQL to get a hold of all of the ids then using Grails .load() might help as well. http://grails.org/doc/latest/ref/Domain%20Classes/load.html

4) I've tried to implement some large batch jobs using Grails/Hibernate before and found that it just really didn't work well. I'd advise you to work directly with the database using SQL if at all possible. It will by far be the quickest.

Upvotes: 1

Related Questions