tomekfranek
tomekfranek

Reputation: 7099

How do I optimise getting and updating the id for 500000 records?

I have a CSV file that contains data like the id of user, unit and size.

I want to update member_id for 500,000 products:

500000.times do |i|
  user = User.find(id: tmp[i])

  hash = {
    unit: tmp[UNIT],
    size: tmp[SIZE]
  }
  hash.merge!(user_id: user.id)  if user.present?

  Product.create(hash)
end

How do I optimize that procedure to not find each User object but maybe get an array of related hashes?

Upvotes: 0

Views: 85

Answers (1)

tadman
tadman

Reputation: 211540

There's two things here that are massively holding back performance. First you're doing N User.find calls which is totally out of control. Secondly you're creating individual records instead of doing a mass-insert each of which runs inside its own tiny transaction block.

Generally these sorts of bulk operations are better done purely in the SQL domain. You can insert a very large number of rows at the same time, often only limited by the size of the query you can submit, and that parameter is usually adjustable.

While a gigantic query may lock or block your database for a period of time, it will be the fastest way to do your updates. If you need to keep your system running during mass inserts, you'll need to break it up into a series of smaller commits.

Remember that Product.connection is a more low-level access layer allowing you to manipulate the data directly with queries.

Upvotes: 4

Related Questions