Reputation: 2084
I'm creating a web crawler using Mechanize for ruby. I'll be running batches of 200k at a time and I want to be able to set an instance variable that the site is not valid and move on with the next site when the get request return an error. For example I'm crawling a site that returns when an http get request is fired against it Error 101 (net::ERR_CONNECTION_RESET): The connection was reset.
and my application crash.
def crawl
agent = Mechanize.new
agent.log = Logger.new('out.log')
agent.user_agent_alias = 'Mac Safari'
begin
page = agent.get(@url)
rescue Mechanize::ResponseCodeError => exception
if exception.response_code == '400' or exception.response_code == '500'
@isActive = false
return
end
end
end
Is there an exception I should catch so I can recover from ERR_CONNECTION_RESET or what's the approach that you guys used to do this?
Upvotes: 0
Views: 242
Reputation: 55002
Why not catch everything?
begin
page = agent.get(@url)
rescue
@isActive = false
end
Upvotes: 1