codr
codr

Reputation: 914

efficient bulk update rails database

I'm trying to build a rake utility that will update my database every so often.

This is the code I have so far:

namespace :utils do

  # utils:update_ip
  # Downloads the file frim <url> to the temp folder then unzips it in <file_path>
  # Then updates the database.

  desc "Update ip-to-country database"
  task :update_ip => :environment do

    require 'open-uri'
    require 'zip/zipfilesystem'
    require 'csv'

    file_name = "ip-to-country.csv"
    file_path = "#{RAILS_ROOT}/db/" + file_name
    url = 'http://ip-to-country.webhosting.info/downloads/ip-to-country.csv.zip'


    #check last time we updated the database.
    mod_time = ''
    mod_time = File.new(file_path).mtime.httpdate    if File.exists? file_path

    begin
      puts 'Downloading update...'
      #send conditional GET to server
      zipped_file = open(url, {'If-Modified-Since' => mod_time})
    rescue OpenURI::HTTPError => the_error
      if the_error.io.status[0] == '304'
        puts 'Nothing to update.'
      else
        puts 'HTTPError: ' + the_error.message
      end
    else # file was downloaded without error.

      Rails.logger.info 'ip-to-coutry: Remote database was last updated: ' + zipped_file.meta['last-modified']
      delay = Time.now - zipped_file.last_modified
      Rails.logger.info "ip-to-country: Database was outdated for: #{delay} seconds (#{delay / 60 / 60 / 24 } days)"

      puts 'Unzipping...'
      File.delete(file_path) if File.exists? file_path
      Zip::ZipFile.open(zipped_file.path) do |zipfile|
        zipfile.extract(file_name, file_path)
      end

      Iptocs.delete_all

      puts "Importing new database..."


      # TODO: way, way too heavy find a better solution.


      CSV.open(file_path, 'r') do |row|
        ip = Iptocs.new(  :ip_from        => row.shift,
                        :ip_to          => row.shift,
                        :country_code2  => row.shift,
                        :country_code3  => row.shift,
                        :country_name   => row.shift)
        ip.save
      end #CSV
      puts "Complete."

    end #begin-resuce
  end #task
end #namespace

The problem I'm having is that this takes a few minutes to enter the 100 thousand plus entries. I'd like to find a more efficient way of updating my database. Ideally this will remain independent of the database type, but if not my production server will be running on MySQL.

Thank you for any insight.

Upvotes: 5

Views: 11879

Answers (5)

reto
reto

Reputation: 16732

I'm currently experimenting with activerecord-import, which sounds very promising:

https://github.com/zdennis/activerecord-import

Upvotes: 1

Harish Shetty
Harish Shetty

Reputation: 64363

Have you tried to use AR Extensions for bulk import? You get impressive performance improvements when you are inserting 1000's of rows to DB. Visit their website for more details.

Refer to these examples for more information

Usage Example 1

Usage Example 2

Usage Example 3

Upvotes: 9

Alex Reisner
Alex Reisner

Reputation: 29432

As Larry says, use your DB-specific import utilities if the file comes in the format you want. However, if you need to manipulate the data before inserting, you can generate a single INSERT query with data for many rows, which is faster than using a separate query for each row (as ActiveRecord will do). For example:

INSERT INTO iptocs (ip_from, ip_to, country_code) VALUES
  ('xxx', 'xxx', 'xxx'),
  ('yyy', 'yyy', 'yyy'),
  ...;

Upvotes: 0

Larry K
Larry K

Reputation: 49104

Use the database level utilities for high speed Luke!

Unfortunately, they're db specific. But they are fast For mysql, see http://dev.mysql.com/doc/refman/5.1/en/load-data.html

Upvotes: 3

Zepplock
Zepplock

Reputation: 29135

You could generate a text file with all INSERTs you need and then execute:

mysql -u user -p db_name < mytextfile.txt

Not sure if this will be any faster but worth a try...

Upvotes: 1

Related Questions