justinraczak
justinraczak

Reputation: 786

Rails/Heroku - How to create a background job for process that requires file upload

I run my Rails app on Heroku. I have an admin dashboard that allows for creating new objects in bulk through a custom CSV uploader. Ultimately I'll be uploading CSVs with 10k-35k rows. The parser works perfectly on my dev environment and 20k+ entries are successfully created through uploading the CSV. On Heroku, however, I run into H12 errors (request timeout). This obviously makes sense since the files are so large and so many objects are being created. To get around this I tried some simple solutions, amping up the dyno power on Heroku and reducing the CSV file to 2500 rows. Neither of these did the trick.

I tried to use my delayed_job implementation in combination with adding a worker dyno to my procfile to .delay the file upload and process so that the web request wouldn't timeout waiting for the file to process. This fails, though, because this background process relies on a CSV upload which is held in memory at the time of the web request so the background job doesn't have the file when it executes.

It seems like what I might need to do is:

  1. Execute the upload of the CSV to S3 as a background process
  2. Schedule the processing of the CSV file as a background job
  3. Make sure the CSV parser knows how to find the file on S3
  4. Parse and finish

This solution isn't 100% ideal as the admin user who uploads the file will essentially get an "ok, you sent the instructions" confirmation without good visibility into whether or not the process is executing properly. But I can handle that and fix later if it gets the job done.

tl;dr question

Assuming the above-mentioned solution is the right/recommended approach, how can I structure this properly? I am mostly unclear on how to schedule/create a delayed_job entry that knows where to find a CSV file uploaded to S3 via Carrierwave. Any and all help much appreciated.

Please request any code that's helpful.

Upvotes: 1

Views: 1098

Answers (2)

korada
korada

Reputation: 586

You can put the files that need to be processed in a specific S3 bucket and eliminate the need for passing file names to background job.

Background job can fetch files from the specific s3 bucket and start processing.

To provide real time update to the user, you can do the following:

  1. use memcached to maintain the status. Background job should keep updating the status information. If you are not familiar with caching, you can use a db table.

  2. include javascript/jquery in the user response. This script should make ajax requests to get the status information and provide updates to user online. But if it is a big file, user may not want to wait for the completion of the job in which case it is better provide a query interface for checking job status.

  3. background job should delete/move the file from the bucket on completion.

In our app, we let users import data for multiple models and developed a generic design. We maintain the status information in db since we perform some analytics on it. If you are interested, here is a blog article http://koradainc.com/blog/ that describes our design. The design does not describe background process or S3 but combined with above steps should give you full solution.

Upvotes: 1

miahabdu
miahabdu

Reputation: 614

I've primarily used sidekiq to queue asynchronous processes on heroku.

This link is also a great resource to help you get started with implementing sidekiq with heroku.

Upvotes: 1

Related Questions