Timur Nugmanov
Timur Nugmanov

Reputation: 893

Download files asynchronously

I was trying to make a script that downloads all images or videos from a thread in my favourite imageboard: 2ch.hk
I was successful until I wanted to download these files asynchronously (for example, to improve performance)
Here is the code http://ideone.com/k2l4Hm

file = http.get(source).body
require 'net/http'
multithreading = false
Net::HTTP.start("2ch.hk", :use_ssl => true) do |http|
 thread = http.get("/b/res/133467978.html").body
 sources = []
 thread.scan(/<a class="desktop" target="_blank" href=".+">.+<\/a>/).each do |a|
    source = "/b#{/<a class="desktop" target="_blank" href="\.\.(.+)">.+<\/a>/.match(a).to_a[1]}"
    sources << source
  end
  i = 0
  start = Time.now
  if multithreading
    threads = []
    sources.each do |source|
      threads << Thread.new(i) do |j|
        file = http.get(source).body #breaks everything
        # type = /.+\.(.+)/.match(source)[1]
        # open("#{j}.#{type}","wb") { |new_file|
        #   new_file.write(file)
        # }
      end
      i += 1
    end
    threads.each do |thr|
      thr.join
    end
    # until downloade=sources.size
    #
    # end
  else
    sources.each do |source|
      file = http.get(source).body
      type = /.+\.(.+)/.match(source)[1]
      open("#{i}.#{type}","wb") { |new_file|
        new_file.write(file)
      }
      i += 1
      print "#{(((i).to_f / sources.size) * 100).round(2)}% "
    end
    puts
  end
  puts "Done. #{i} files were downloaded. It took #{Time.now - start} seconds"
end

I suppose that this line crashes everything.

file = http.get(source).body

Or maybe that's the problem.

threads.each do |thr|
  thr.join
end


Error messages are always different, from Bad File Descriptor and IO errors to "You may have encountered a bug in the Ruby interpreter or extension libraries."
If you want to try and run my code, please substitute a link to thread in 4th line with a new thread (from 2ch.hk/b), because the one in my code may be deleted by the time you run my code
Version of ruby: 2.3.1, OS Xubuntu 16.10

Upvotes: 0

Views: 449

Answers (2)

quinn
quinn

Reputation: 5998

You'll probably have much better performance using a ruby http lib that supports parallel requests:

https://github.com/typhoeus/typhoeus

e.g.

hydra = Typhoeus::Hydra.new
10.times.map{ hydra.queue(Typhoeus::Request.new("www.example.com", followlocation: true)) }
hydra.run

Upvotes: 2

Timur Nugmanov
Timur Nugmanov

Reputation: 893

The problem with my code is that I can't make multiple requests on a Net::HTTP instance at the same time. The solution is to open an HTTP connection for each thread.

Upvotes: 0

Related Questions