Reputation: 51814
I am trying to download a binary file via HTTP using the following Ruby script.
#!/usr/bin/env ruby
require 'net/http'
require 'uri'
def http_download(resource, filename, debug = false)
uri = URI.parse(resource)
puts "Starting HTTP download for: #{uri}"
http_object = Net::HTTP.new(uri.host, uri.port)
http_object.use_ssl = true if uri.scheme == 'https'
begin
http_object.start do |http|
request = Net::HTTP::Get.new uri.request_uri
Net::HTTP.get_print(uri) if debug
http.read_timeout = 500
http.request request do |response|
open filename, 'w' do |io|
response.read_body do |chunk|
io.write chunk
end
end
end
end
rescue Exception => e
puts "=> Exception: '#{e}'. Skipping download."
return
end
puts "Stored download as #{filename}."
end
However it downloads the HTML source instead of the binary. When I enter the URL in the browser the binary file is downloaded. Here is a URL with which the script fails:
http://dcatlas.dcgis.dc.gov/catalog/download.asp?downloadID=2175&downloadTYPE=KML
I execute the script as follows
pry> require 'myscript'
pry> resource = "http://dcatlas.dcgis.dc.gov/catalog/download.asp?downloadID=2175&downloadTYPE=KML"
pry> http_download(resource,"StreetTreePt.KML", true)
How can I download the binary?
I found this redirection check which looks quite reasonable. When I integrate in the response block it fails with the following error:
Exception: 'undefined method `host' for "save_download.asp?filename=StreetTreePt.KML":String'. Skipping download.
The exception does not occur in the "original" function posted above.
Upvotes: 2
Views: 2838
Reputation: 160551
The documentation for Net::HTTP shows how to handle redirects:
Following Redirection
Each Net::HTTPResponse object belongs to a class for its response code.
For example, all 2XX responses are instances of a Net::HTTPSuccess subclass, a 3XX response is an instance of a Net::HTTPRedirection subclass and a 200 response is an instance of the Net::HTTPOK class. For details of response classes, see the section “HTTP Response Classes” below.
Using a case statement you can handle various types of responses properly:
def fetch(uri_str, limit = 10)
# You should choose a better exception.
raise ArgumentError, 'too many HTTP redirects' if limit == 0
response = Net::HTTP.get_response(URI(uri_str))
case response
when Net::HTTPSuccess then
response
when Net::HTTPRedirection then
location = response['location']
warn "redirected to #{location}"
fetch(location, limit - 1)
else
response.value
end
end
print fetch('http://www.ruby-lang.org')
Or, you can use Ruby's OpenURI, which handles it automatically. Or, the Curb gem will do it. Probably Typhoeus and HTTPClient too.
According to the code you show in your question, the exception you are getting can only come from:
http_object = Net::HTTP.new(uri.host, uri.port)
which is hardly likely since uri
is a URI object. You need to show the complete code if you want help with that problem.
Upvotes: 3