janDro
janDro

Reputation: 1466

Insert 10,000,000+ rows in grails

I've read a lot of articles recently about populating a grails table from huge data, but seem to have hit a ceiling. My code is as follows:

class LoadingService {
    def sessionFactory
    def dataSource
    def propertyInstanceMap = org.codehaus.groovy.grails.plugins.DomainClassGrailsPlugin.PROPERTY_INSTANCE_MAP

    def insertFile(fileName) {
        InputStream inputFile = getClass().classLoader.getResourceAsStream(fileName)
        def pCounter = 1
        def mCounter = 1
        Sql sql = new Sql(dataSource)
        inputFile.splitEachLine(/\n|\r|,/) { line -> 
            line.each { value ->
                if(value.equalsIgnoreCase('0') { 
                    pCounter++
                    return
                }
                sql.executeInsert("insert into Patient_MRNA (patient_id, mrna_id, value) values (${pCounter}, ${mCounter}, ${value.toFloat()})")
                pCounter++
            }
            if(mCounter % 100 == 0) {
                cleanUpGorm()
            }
            pCounter = 1
            mCounter++
        }
    }

    def cleanUpGorm() {
        session.currentSession.clear()
        propertyInstanceMap.get().clear()
    }
}

I have disabled secondary cache, I'm using assigned ids, and I am explicitly handling this many to many relationship through a domain, not the hasMany and belongsTo.

My speed has increased monumentally after applying these methods, but after a while the inserts slow down to the point of almost stopping compared to about 623,000 per minute at the beginning.

Is there some other memory leak that I should be aware of or have I just hit the ceiling in terms of batch inserts in grails?

To be clear it takes about 2 minutes to insert 1.2 million rows, but then they start to slow down.

Upvotes: 1

Views: 945

Answers (2)

wwwclaes
wwwclaes

Reputation: 1152

I have fought with this in earlier versions of Grails. Back then I resorted to either simply run the batch manually in proper chunks or use another tool for the batch import, such as Pentaho Data Integration (or other ETL tool or DIY).

Upvotes: 1

Davor Pecet
Davor Pecet

Reputation: 171

Try doing batch inserts, it's much more efficient

def updateCounts = sql.withBatch { stmt ->
     stmt.addBatch("insert into TABLENAME ...")
     stmt.addBatch("insert into TABLENAME ...")
     stmt.addBatch("insert into TABLENAME ...")
     ...
 }

Upvotes: 3

Related Questions