Saurabh Sharma
Saurabh Sharma

Reputation: 2341

Add images to fetch in a queue then grab them

I am fairly new to ruby mutli-threading and was confused on how to get started. I am currently building an app and it needs to fetch a LOT of images so I want to do it in a different thread. I wanted the program to execute as shown in the code below.

PROBLEM: The problem I see here is that the bar_method will get done fetching faster and the thread will end so things will keep getting added to the queue but won't be processed. Is there any way of synchronization possible that will alert the bar_method thread that a new item has been added to the queue and if bar_method does finish earlier it should go to sleep and wait on a new item to be added to the queue?

def foo_method 
  queue created - consists of url to fetch and a callback method
  synch = Mutex.new
  Thread.new do
    bar_method synch, queue 
  end
  100000.times do
    synch.synchronize do
      queue << {url => img_url, method_callback => the_callback}
    end
  end
end
def bar_method synch_obj, queue
  synch_obj.synchronize do
    while queue isn't empty
        pop the queue. fetch image and call the callback
    end   
  end
end 

Upvotes: 3

Views: 87

Answers (1)

the Tin Man
the Tin Man

Reputation: 160571

If you need to retrieve files from the internet, and use parallel requests, I'll highly recommend Typhoeus and Hydra.

From the documentation:

hydra = Typhoeus::Hydra.new
10.times.map{ hydra.queue(Typhoeus::Request.new("www.example.com", followlocation: true)) }
hydra.run

You can set the number of concurrent connections in Hydra:

:max_concurrency (Integer) — Number of max concurrent connections to create. Default is 200.

As a second recommendation look into Curb. Again, from its documentation:

# make multiple GET requests
easy_options = {:follow_location => true}
multi_options = {:pipeline => true}

Curl::Multi.get('url1','url2','url3','url4','url5', easy_options, multi_options) do|easy|
  # do something interesting with the easy response
  puts easy.last_effective_url
end

Both are built on top of Curl, so there's no real difference in their underlying technology or its robustness. The difference is the commands available to you.

Another gem that gets a lot of attention is EventMachine. It has EM-HTTP-Request which allows concurrent requests:

EventMachine.run {
  http1 = EventMachine::HttpRequest.new('http://google.com/').get
  http2 = EventMachine::HttpRequest.new('http://yahoo.com/').get

  http1.callback { }
  http2.callback { } 
end

Upvotes: 2

Related Questions