Reputation: 290
I am importing a large amount of data from a csv file, (file size is over 100MB)
the code i'm using looks like this :
def errorLignes = []
def index = 1
csvFile.toCsvReader(['charset':'UTF-8']).eachLine { tokens ->
if (index % 100 == 0) cleanUpGorm()
index++
def order = Orders.findByReferenceAndOrganization(tokens[0],organization)
if (!order) {
order = new Orders()
}
if (tokens[1]){
def user = User.findByReferenceAndOrganization(tokens[1],organization)
if (user){
order.user = user
}else{
errorLignes.add(tokens)
}
}
if (tokens[2]){
def customer = Customer.findByCustomCodeAndOrganization(tokens[2],organization)
if (customer){
order.customer = customer
}else{
errorLignes.add(tokens)
}
}
if (tokens[3]){
order.orderType = Integer.parseInt(tokens[3])
}
// etc.....................
order.save()
}
and i'm using the cleanUpGorm method to clean session after each 100 entries
def cleanUpGorm() {
println "clean up gorm"
def session = sessionFactory.currentSession
session.flush()
session.clear()
propertyInstanceMap.get().clear()
}
I also turned 2nd level cache off
hibernate {
cache.use_second_level_cache = false
cache.use_query_cache = false
cache.provider_class = 'net.sf.ehcache.hibernate.EhCacheProvider'
}
the grails version of the project is 2.0.4 and as database i am using mysql
for every entry , i am doing 3 calls to a find
and finally i'm saving the order instance
the import process is too slow, i am wondering how can I speed up and optimise this code.
I found that the searchable plugin is also making it slower . so , to get around this , I used the command :
searchableService.stopMirroring()
But it still not fast enough,I am finally changing the code to use groovy sql instead
Upvotes: 1
Views: 5344
Reputation: 279
I had used batch insert while insert records, this is much faster than gorm cleanup method. Below example describes you how to implement it.
Date startTime = new Date()
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
(1..50000).each {counter ->
Person person = new Person()
person.firstName = "abc"
person.middleName = "abc"
person.lastName = "abc"
person.address = "abc"
person.favouriteGame = "abc"
person.favouriteActor = "abc"
session.save(person)
if(counter.mod(100)==0) {
session.flush();
session.clear();
}
if(counter.mod(10000)==0) {
Date endTime =new Date()
println "Record inserted Counter =>"+counter+" Time =>"+TimeCategory.minus(endTime,startTime)
}
}
tx.commit();
session.close();
Upvotes: 1
Reputation: 11861
I'm also in the process of writing a number of services that will accomplish loads of very large datasets (multiple files of up to ~17million rows each). I initially tried the cleanUpGorm
method you use, but found that, whilst it did improve things, the loading was still slow. Here's what I did to make it much faster:
Investigate what it is that is causing the app to actually be slow. I installed the Grails Melody plugin, then did a run-app
then opened a browser at /monitoring
. I could then see which routines took time to execute and what the worst-performing queries actually were.
Many of the Grails GORM methods map to a SQL ... where ...
clause. You need to ensure that you have an index for each item used in a where clause for each query that you want to make faster, otherwise the method will become considerably slower the bigger your dataset is. This includes putting indexes on the id
and version
columns that are injected into each of your domain classes.
Ensure you have indexes set up for all of your hasMany and belongsTo relationships.
If the performance is still too slow, use Spring Batch. Even if you've never used it before, it should take you no time at all to set up a batch parse of a CSV file to parse into Grails domain objects. I suggest you use the grails-spring-batch
plugin to do this and use the examples here to get a working implementation going quickly. It's extremely fast, very configurable and you don't have to mess around with cleaning up the session.
Upvotes: 1
Reputation: 571
This found this blog entry very useful: http://naleid.com/blog/2009/10/01/batch-import-performance-with-grails-and-mysql/
You are already cleaning up GORM, but try cleaning every 100 entries:
def propertyInstanceMap = org.codehaus.groovy.grails.plugins.DomainClassGrailsPlugin.PROPERTY_INSTANCE_MAP
propertyInstanceMap.get().clear()
Creating database indexes might help aswell and use default-storage-engine=innodb
instead of MyISAM
.
Upvotes: 1