Reputation: 5892
I have a question that's probably mostly simple for many of you.
In my ruby on rails application, I'll be importing 20k+ rows of data into the database quite often. The component of the application that this happens in is taking a variable which contains the data in a list, loops through that, and does a new Model.create(data) on each line.
I've noticed that this can take a good 1.5 minutes or so for about 17k rows of data.
So essentially, it looks something similar to this:
@items = []
import_file = File.open('file')
data = import_file.read.split("\n")
data.each do |item|
name = item.scan(/<name>(.*?)<\/name>)[0][0]
address = item... etc
@items << {
:name => name,
:address => address,
etc
}
end
@items.each do |row|
Model.create(row)
end
When monitoring this in the rails server console, I can obviously see all 17k inserts, and it takes even longer for deleting when monitoring this in the console as well.
I'm sure this is very inefficient, so I've come to see if anyone has any suggestions or if this is just pretty normal for the amount of data that's being imported.
Upvotes: 1
Views: 50
Reputation: 2002
A good way for such mass-/bulk-inserts seems to be activerecord-import. Someone has also made a benchmark about it.
According to the introductory example you could do something like this:
@items = []
import_file = File.open('file')
data = import_file.read.split("\n")
data.each do |item|
name = item.scan(/<name>(.*?)<\/name>)[0][0]
address = item... etc
@items << Model.new(
:name => name,
:address => address,
etc
)
end
Model.import(@items)
The manual, single insert way:
@items = []
import_file = File.open('file')
data = import_file.read.split("\n")
data.each do |item|
name = item.scan(/<name>(.*?)<\/name>)[0][0]
address = item... etc
@items << "(#{name}, #{address},...)"
end
sql = "INSERT INTO models (`name`, `address`, ...) VALUES #{@items.join(", ")}"
ActiveRecord::Base.connection.execute(sql)
Upvotes: 1