DRobinson
DRobinson

Reputation: 4471

Feasibility of Heroku Concurrency in Interim API Call?

Sorry for the confusing title, it was difficult to come up with one that fit my question.

To explain what I'm trying to do, I'll use an analogous example: Suppose I wanted to create an API call which, when provided an array of terms, searched Twitter for all of these terms and returned the tweets.

On one hand, I could do something simple, such as (if you'll excuse the pseudo-code):

results = []
search_terms.each_with_index do |search_term, i|
  search_uri = "http://search.twitter.com/search.json?q=#{search_term}"
  twitter_result = ... #(URI.parse, Net:HTTP:GET, start, etc.)
  results[i] = twitter_result 
end
render :json => results.to_json

But, of course, this might be slow if there are a couple dozen keywords - moreso if there are over 100 - because Ruby has to finish waiting for each request to begin the next.

Heroku's dynos seem like they should be able to make this work quickly (for example, the Dyno Overview states that "it is possible to originate outgoing requests directly from a dyno"). However, it seems that the means of accessing these tends to be through Delayed Jobs, Resque, etc., which tend to have very different use cases from what I've outlined.

From what I can tell, those means of utilizing multiple dynos/workers won't be workable like typical threads: accessing and modifying instance variables which were instantiated by the caller/parent function. From what I can tell (and correct me if I'm wrong), generally they use their own memory, their own variables, and anything that other functions have to access will be done so through a cache or through the database.

Alright, so that doesn't make this idea impossible. A workaround could be, after creating these DelayedJobs, run an indefinite loop (with a timeout) in the parent function that fetches from the DB and checks whether the resulting size is equal to the number of keywords, and otherwise sleep a short period, trying again. The DelayedJobs would create these rows after performing their task. It's not a particularly nice solution, and ends up causing a bit of extra work for the server, but as the number of searches increases this would probably be significantly faster than just looping through them.

So here's the question (and a couple related questions, that tie into the first):

What I'm wondering is, how feasible is it to set up a request on Heroku which, upon being called, spins up some new workers quickly, performs several requests in parallel, and then responds to the caller after all of these requests are complete?

Will the time required to spin up the workers and perform the DelayedJobs be too hefty to make this work? (The entire length of the request would, hopefully, only be a few seconds.)

Is there any suggestion about a cut-off number of requests where it's better to just to do them in sequence, rather than use the DelayedJobs? (I'd expect not, and that I'd have to do some experimenting and benchmarking for this, to decided at what point to route to either option.)

Have I missed other options that would simplify this process (or, otherwise be more effective)?

EDIT: I should note, too, that the additional workers would be spun up on demand using a gem (something like hirefire, or similar) - I'm not sure how much this would factor in.

Upvotes: 0

Views: 108

Answers (1)

Thomas Klemm
Thomas Klemm

Reputation: 10856

Retrieving information from external APIs is a blocking task in Ruby, that means, the process or thread you are performing this request in will basically sleep while waiting for a response .

To do a lot of concurrent I/O in the background (like searching Twitter, other HTTP requests etc.) my library of choice in the Ruby/Rails world would be Sidekiq. You can read about it's efficiency and advantages over delayed_job and resque in the wiki.

Upvotes: 1

Related Questions