Reputation: 75
I get this error trying to scrape a website with mechanize.
This is my code:
agent = Mechanize.new
agent.user_agent_alias = 'Mac Safari'
agent.keep_alive = false
page = agent.get('https://web.archive.org/web/20170417084732/https://www.cs.auckland.ac.nz/~andwhay/postlist.html')
page.links_with(:text => 'post').each do |link|
post = link.click
Article.create(
user_id: 1,
title: post.css('title'),
text: post.at("//div[@itemprop = 'description']")
)
end
I also used this code to avoid the "Too Many Connection Resets" error.
Upvotes: 3
Views: 797
Reputation: 66
The code from the linked blog post seems to be incompatible with v3.0.0 of the net-http-persistent gem. Note that Mechanize v2.7.6 (the current version as of this writing) is compatible with net-http-persistent >= v2.5.2, which includes v3.0.0.
The short answer is to do one of the following:
self.http.shutdown
on line 44 of the linked blog postThe long answer is that the net-http-persistent gem started using the connection_pool gem in v3.0.0, which changed the behavior of Net::HTTP::Persistent#shutdown
(aka self.http.shutdown
in Mechanize::HTTP::Agent
). The new behavior raises a ConnectionPool::Error
("no connections are checked out") if a request is made after shutdown
has been invoked.
However, looking through the code of both net-http-persistent v2.9.4 and v3.0.0, it seems like self.http.shutdown
may not have been necessary in the first place. The main purpose of shutdown
seems to be invoking finish
on each of the connections. In both v2.9.4 and v3.0.0, when Net::HTTP::Persistent#request
rescues from an Errno::ECONNRESET
exception (the original cause of all this), it retries only once and then calls Net::HTTP::Persistent#request_failed
. request_failed
in turn calls Net::HTTP::Persistent#finish
with the connection. Thus, it seems the only necessary monkey patching is to retry more than once.
Upvotes: 4