ZenBalance
ZenBalance

Reputation: 10317

Ruby Mechanize Connection timed out

I have been practicing writing a number of Ruby scrapers using Mechanize and Nokogiri. For instance here ( However, it seems that after making a certain number of requests (about 14000 in this case) I get an error saying I have a connection timed out error:

/var/lib/gems/1.8/gems/net-http-persistent-2.5.1/lib/net/http/persistent/ssl_reuse.rb:90:in `initialize': Connection timed out - connect(2) (Errno::ETIMEDOUT)

I have Googled a lot online, but the best answer I can get is that I am making too many requests to the server. Is there a way to fix this by throttling or some other method?

Upvotes: 0

Views: 934

Answers (1)

ZenBalance
ZenBalance

Reputation: 10317

After some more programming experience, I realized that this was a simple error on my part: my code did not catch the error thrown and appropriately move to the next link when a link was corrupted.

For any novice Ruby programmers that encounter a similar problem:

The Connection timed out error is usually due to an invalid link, etc. on the page being scrapped.

You need to wrap the code that is accessing link in a statement such as the below

begin 
     #[1 your scraping code here ] 
rescue
     #[2 code to move to the next link/page/etc. that you are scraping instead of sticking to the invalid one] 
end

For instance, if you have a for loop that is iterating over links and extracting information from each link, then that should be at [1] and code to move to the next link (consider using something like ruby "next") should be placed at [2]. You might also consider printing something to the console to let the user know that a link was invalid.

Upvotes: 0

Related Questions