K M Rakibul Islam
K M Rakibul Islam

Reputation: 34338

heroku database access and modify is too expensive/time consuming? how to make it faster?

I am working on a ruby on rails project currently where I scrape and save data everyday from sites who publish valuable data daily but don't store them for long. I am scraping those and saving those to our database to make time series data sets. We have almost 75 assets and for each asset, we have 20 years historical data on an average. There was an error in an algorithm previously and due to that all of our historical data are incorrect now. I found the problem and re-write the algorithm which is working perfectly right now. I tried to modify the database for previous 1 year historical data (for 1 asset) and it's working perfectly. Now, the fact is, when I update the database locally it takes about 10 minutes for 1 asset with 1 year historical data. If I do the same thing on heroku, it takes about 37 minutes which seems to be very long. As we have 75 assets each with 20 years historical data so I assume, it should take, 75*20*37 = 55,550 Minutes = 925 Hours !!! It does not seem to be feasible to me. Again, the data is very valuable for us so we need to update our database for all historical data that we have. I am using PostgreSQL database locally and on heroku as well. My suspect is, Rails active record is not designed to do this kind of things and it's very much expensive too. What should I do in this situation? What should be the optimal solution for my problem? How can I make this task much more faster serving my purpose completely ? Any kind of suggestion/idea is well appreciated.

Upvotes: 1

Views: 327

Answers (2)

Adrien Coquio
Adrien Coquio

Reputation: 4930

I already encounter this kind of problem and I used Sequel to address it. It will let you write your translation algorithm in Ruby but without using the heavy ActiveRecord features.

If the algorithm writed with Sequel is still taking to much time, you will have to write straight SQL as @mu is too shirt as suggested. It will probably be a lot easier to translate the Sequel code to row SQL than the ActiveRecord code.

Last, each script you will run will be on one Heroku dyno, they have limited capacity and it could be better for you to run it locally and upload your fixed database to Heroku than running the script directly on Heroku. Maybe there is also some heroku addons which can give you more resources.

Upvotes: 1

Chazu
Chazu

Reputation: 1018

There are a couple of things you should consider. As mu is too short alluded in a comment above, it may be helpful to get rid of any overhead Rails is providing. You can do this by utilizing the Sequel gem to write a rake task which accesses your database with less overhead. Sequel provides a fairly simple API which can help you write efficient queries without the unintuitive syntax of SQL and without the overhead of ActiveRecord.

I'm not savvy enough about Heroku's internals, however another thing to consider is whether heroku's instance which is running your code will be able to do the heavy lifting in your rake task quickly enough. Another user can likely comment on whether you stand to gain from running the rake task from another machine, or even simply by cranking up the resources on your heroku instance

Upvotes: 2

Related Questions