Reputation: 11244
I am using Nokogiri to scrape web pages. Few urls need to be guessed and returns 404 not found error when they don't exist. Is there a way to capture this exception?
http://yoursite/page/38475 #=> page number 38475 doesn't exist
I tried the following which didn't work.
url = "http://yoursite/page/38475"
doc = Nokogiri::HTML(open(url)) do
begin
rescue Exception => e
puts "Try again later"
end
end
Upvotes: 10
Views: 8570
Reputation: 51151
It doesn't work, because you are not rescuing part of code (it's open(url)
call) that raises an error in case of finding 404 status. The following code should work:
url = 'http://yoursite/page/38475'
begin
file = open(url)
doc = Nokogiri::HTML(file) do
# handle doc
end
rescue OpenURI::HTTPError => e
if e.message == '404 Not Found'
# handle 404 error
else
raise e
end
end
BTW, about rescuing Exception
:
Why is it a bad style to `rescue Exception => e` in Ruby?
Upvotes: 24