Kevin
Kevin

Reputation: 187

How to make multiple parallel concurrent requests with Rails and Heroku

I am currently developing a Rails application which takes a long list of links as input, scrapes them using a background worker (Resque), then serves the results to the user. However, in some cases, there are numerous URLs and I would like to be able to make multiple requests in parallel / concurrency such that it would take much less time, rather than waiting for one request to complete to a page, scraping it, and moving on to the next one.

Is there a way to do this in heroku/rails? Where might I find more information?

I've come across resque-pool but I'm not sure whether it would solve this issue and/or how to implement. I've also read about using different types of servers to run rails in order to make concurrency possible, but don't know how to modify my current situation to take advantage of this.

Any help would be greatly appreciated.

Upvotes: 0

Views: 1732

Answers (2)

Gabriel
Gabriel

Reputation: 169

Adding these two lines to your code will also let you wait until the last job is complete before proceeding:

  • this line ensures that your program waits for at least one job is enqueued before checking that all jobs are completed as to avoid misinterpreting an unfilled queue as the completion of all jobs

sleep(0.2) until Sidekiq::Queue.new.size > 0 || Sidekiq::Workers.new.size > 0

  • this line ensures your program waits till all jobs are done

sleep(0.5) until Sidekiq::Workers.new.size == 0 && Sidekiq::Queue.new.size == 0

Upvotes: 0

Adrian
Adrian

Reputation: 425

Don't use Resque. Use Sidekiq instead.

Resque runs in a single-threaded process, meaning the workers run synchronously, while Sidekiq runs in a multithreaded process, meaning the workers run asynchronously/simutaneously in different threads.

Make sure you assign a URL to scrape per worker. It's no use if one worker scrape multiple URLs.

With Sidekiq, you can pass the link to a worker, e.g.

LINKS = [...]
LINKS.each do |link|
  ScrapeWoker.perform_async(link)
end

The perform_async doesn't actually execute the job right away. Instead, the link is just put in a queue in redis along with the worker class, and so on, and later (could be milliseconds later) workers are assigned to execute each job in queue in its own thread by running the perform instance method in ScrapeWorker. Sidekiq will make sure to retry again if exception occur during execution of a worker.

PS: You don't have pass a link to the worker. You can store the links to a table and then pass the ids of the records to workers.

More info about sidekiq

Upvotes: 1

Related Questions