green1919
green1919

Reputation: 289

Execute approx 2 million parallel request Typhoeus

I have roughly 2.6 million records that need to be updated (using a PUT) externally by making a request. This is only going to be a one time thing, and so I have the following:

@hydra ||= Typhoeus::Hydra.hydra
million_records.each do |id|
  typhoeus_request = Typhoeus::Request.new(
    url: "http://localhost:300/posts/#{id}" 
    headers: {'content-type' => 'application/json'},
    params: {field1: 'Hello World'}
    method: :put
  )
  @hydra.queue typhoeus_request
end
@hydra.run

I'd read the documentation surrounding parallel requests and it states:

Hydra will also handle how many requests you can make in parallel. Things will get flakey if you try to make too many requests at the same time. The built in limit is 200. When more requests than that are queued up, hydra will save them for later and start the requests as others are finished.

My question is, are there any performance flaws with the above? If so how can I improve the above so its more performant.

Or another suggestion would be, for each iteration create a new hyrda instance queue it and push the hydra instance into an array and then go through them using the Parallel gem. For example:

batches = []

million_records.each do |id|
  hydra ||= Typhoeus::Hydra.hydra
  typhoeus_request = Typhoeus::Request.new(
    url: "http://localhost:300/posts/#{id}",
    params: {field1: 'Hello World'},
    headers: {'content-type' => 'application/json'},
    method: :put
  )
  hydra.queue typhoeus_request
  batches.push(hydra)
end

Parallel.each(batches, in_threads: 5) do |batch|
  batch.run
end

Upvotes: 4

Views: 552

Answers (0)

Related Questions