Patm
Patm

Reputation: 2075

Updating large amounts of data in Rails App

I have a rails app with a table of about 30 million rows that I build from a text document my data provider gives me quarterly. From there I do some manipulation and comparison with some other tables and create an additional table with a more customized data.

My first time doing this, I ran a ruby script through Rails console. This was slow and obviously not the best way.

What is the best way to streamline this process and update it on my production server without any, or at least very limited downtime?

This is the process I'm thinking is best for now:

  1. create rake tasks for reading in the data. Use activerecord-import plugin to do batch writing and to turn off activerecord validations. Load this data into brand new, duplicate tables.

  2. Build indexes on newly created tables.

  3. Rename newly created tables to the names the rails app is looking for.

  4. Delete the old.

All of this I'm planning on doing right on the production server.

Is there a better way to do this?

Other notes from comments:

Upvotes: 2

Views: 1395

Answers (2)

Cade Roux
Cade Roux

Reputation: 89651

Depending on your logic, I would seriously consider processing the data in the database using SQL. This is close to the data and 30m rows is typically not a thing you want to be pulling out of the database and comparing to other data you have also pulled out of the database.

So think outside of the Ruby on Rails box.

SQL has built-in capability to join data and compare data and insert and update tables, those capabilities can be very powerful and fast, allowing the data to be processed close to the data.

Upvotes: 1

TomDunning
TomDunning

Reputation: 4877

1) create "my_table_new" as an empty clone of "my_table" 2) import the file (in batches of x lines) into my_new_table - indexes built as you go. 3) Run: RENAME TABLE my_table TO my_table_old, my_table_new TO my_table;

Doing this as one command makes it instant (close enough) so virtually no downtime. I've done this with large data sets, and as its the rename that's the 'switch' you should retain uptime.

Upvotes: 1

Related Questions