Reputation: 1210
I have a ruby on rails application that is trying to access various links on Yahoo Sports and sometimes when it tries to reach a page, it gives me this error below. The error is consistent and any of the links it fails on, it always fails on. It is not a case of sometimes they work and sometimes they don't. You will find though that the page does exist and loads fine, so I'm not sure why it is giving me an error. Has anyone experienced this type of behavior before and if so, do you have any suggestions on how to get this to work?
404 => Net::HTTPNotFound for http://sports.yahoo.com/mlb/players/9893/ -- unhandled response
@client = Mechanize.new()
@client.request_headers = { "Accept-Encoding" => "" }
@client.ignore_bad_chunking = true
#works
#url = 'http://sports.yahoo.com/mlb/players/7307'
#doesn't work
url = 'http://sports.yahoo.com/mlb/players/9893'
result = @client.get(url)
Upvotes: 0
Views: 2124
Reputation: 2208
I wasn't able to figure this out with mechanize, but I was able to get the URL from HTTParty. If you do a rescue from a mechanize failure and retry by looking for a redirect URI you should be set:
require 'mechanize'
require 'httparty'
@client = Mechanize.new()
url = 'http://sports.yahoo.com/mlb/players/9893'
begin
result = @client.get(url)
rescue Mechanize::ResponseCodeError => e
redirect_url = HTTParty.get(url).request.last_uri.to_s
result = @client.get(redirect_url)
end
Upvotes: 2
Reputation: 61
You need to handle the redirect. Mechanize offers a method for that- follow_meta_refresh. Try adding it to your code. Example:
require 'mechanize'
@client = Mechanize.new()
@client.request_headers = { "Accept-Encoding" => "" }
@client.ignore_bad_chunking = true
@client.follow_meta_refresh = true
#works
#url = 'http://sports.yahoo.com/mlb/players/7307'
#doesn't work
url = 'http://sports.yahoo.com/mlb/players/9893'
result = @client.get(url)
pp result
The pp at the bottom will print out the page in a nice format for further crawling. It looks like the correct content on my machine.
Upvotes: 0