Joe Bloggos
Joe Bloggos

Reputation: 889

mechanize dealing with errors

you will see through a series of questions I have built a little mechanize task to visit a page() find the links to cafes and save the details of the cafe in a csv.

 task :estimateone => :environment do
  require 'mechanize'
  require 'csv'

  mechanize = Mechanize.new
  mechanize.history_added = Proc.new { sleep 30.0 }
  mechanize.ignore_bad_chunking = true
  mechanize.follow_meta_refresh = true
  page = mechanize.get('http://www.siteexamplea.com/city/list/50-city-cafes-you-should-have-eaten-breakfast-at')
  results = []
  results << ['name', 'streetAddress', 'addressLocality', 'postalCode', 'addressRegion', 'addressCountry', 'telephone', 'url']
  page.css('ol li a').each do |link|
   mechanize.click(link)

   name = mechanize.page.css('article h1[itemprop="name"]').text.strip
   streetAddress = mechanize.page.css('address span span[itemprop="streetAddress"]').text.strip
   addressLocality = mechanize.page.css('address span span[itemprop="addressLocality"]').text.strip
   postalCode = mechanize.page.css('address span span[itemprop="postalCode"]').text.strip
   addressRegion = mechanize.page.css('address span span[itemprop="addressRegion"]').text.strip
   addressCountry = mechanize.page.css('address span meta[itemprop="addressCountry"]').text.strip
   telephone = mechanize.page.css('address span[itemprop="telephone"]').text.strip
   url = mechanize.page.css('article p a[itemprop="url"]').text.strip
   tags = mechanize.page.css('article h1[itemprop="name"]').text.strip

    results << [name, streetAddress, addressLocality, postalCode, addressRegion, addressCountry, telephone, url]
  end

  CSV.open("filename.csv", "w+") do |csv_file|
    results.each do |row|
      csv_file << row
    end
  end
end

when i get to the tenth link I hit a 503 error.

Mechanize::ResponseCodeError: 503 => Net::HTTPServiceUnavailable for https://www.city.com/city/directory/morning-after -- unhandled response

I have tried a couple of things to stop this happening or rescue from this state but I can't work it out. Any tips?

Upvotes: 1

Views: 481

Answers (1)

Alex.U
Alex.U

Reputation: 1701

You'd want to rescue on failed request, just like here

task :estimateone => :environment do
  require 'mechanize'
  require 'csv'

  begin
  # ...
  page = mechanize.get('http://www.theurbanlist.com/brisbane/a-list/50-brisbane-cafes-you-should-have-eaten-breakfast-at')
  rescue Mechanize::ResponseCodeError
    # do something with the result, log it, write it, mark it as failed, wait a bit and then continue the job
    next
  end
end

My guess is that you're hitting API rate limits. This will not solve your problem as it is not in your side but at the server's; but will give you range to work as now you can flag the links that did not work and continue from there on.

Upvotes: 1

Related Questions