JJD
JJD

Reputation: 51814

How to download a binary file via Net::HTTP::Get?

I am trying to download a binary file via HTTP using the following Ruby script.

#!/usr/bin/env ruby
require 'net/http'
require 'uri'

def http_download(resource, filename, debug = false)
  uri = URI.parse(resource)
  puts "Starting HTTP download for: #{uri}"
  http_object = Net::HTTP.new(uri.host, uri.port)
  http_object.use_ssl = true if uri.scheme == 'https'
  begin
    http_object.start do |http|
      request = Net::HTTP::Get.new uri.request_uri
      Net::HTTP.get_print(uri) if debug
      http.read_timeout = 500
      http.request request do |response|
        open filename, 'w' do |io|
          response.read_body do |chunk|
            io.write chunk
          end
        end
      end
    end
  rescue Exception => e
    puts "=> Exception: '#{e}'. Skipping download."
    return
  end
  puts "Stored download as #{filename}."
end

However it downloads the HTML source instead of the binary. When I enter the URL in the browser the binary file is downloaded. Here is a URL with which the script fails:

http://dcatlas.dcgis.dc.gov/catalog/download.asp?downloadID=2175&downloadTYPE=KML

I execute the script as follows

pry> require 'myscript'
pry> resource = "http://dcatlas.dcgis.dc.gov/catalog/download.asp?downloadID=2175&downloadTYPE=KML"
pry> http_download(resource,"StreetTreePt.KML", true)

How can I download the binary?

Redirection experiments

I found this redirection check which looks quite reasonable. When I integrate in the response block it fails with the following error:

Exception: 'undefined method `host' for "save_download.asp?filename=StreetTreePt.KML":String'. Skipping download.

The exception does not occur in the "original" function posted above.

Upvotes: 2

Views: 2838

Answers (1)

the Tin Man
the Tin Man

Reputation: 160551

The documentation for Net::HTTP shows how to handle redirects:

Following Redirection

Each Net::HTTPResponse object belongs to a class for its response code.

For example, all 2XX responses are instances of a Net::HTTPSuccess subclass, a 3XX response is an instance of a Net::HTTPRedirection subclass and a 200 response is an instance of the Net::HTTPOK class. For details of response classes, see the section “HTTP Response Classes” below.

Using a case statement you can handle various types of responses properly:

def fetch(uri_str, limit = 10)
  # You should choose a better exception.
  raise ArgumentError, 'too many HTTP redirects' if limit == 0

  response = Net::HTTP.get_response(URI(uri_str))

  case response
  when Net::HTTPSuccess then
    response
  when Net::HTTPRedirection then
    location = response['location']
    warn "redirected to #{location}"
    fetch(location, limit - 1)
  else
    response.value
  end
end

print fetch('http://www.ruby-lang.org')

Or, you can use Ruby's OpenURI, which handles it automatically. Or, the Curb gem will do it. Probably Typhoeus and HTTPClient too.

According to the code you show in your question, the exception you are getting can only come from:

http_object = Net::HTTP.new(uri.host, uri.port)

which is hardly likely since uri is a URI object. You need to show the complete code if you want help with that problem.

Upvotes: 3

Related Questions