Reputation: 53
I am using Ruby 3.0.0, MetaInspector 5.13.0, and socketry's Async 2.3.0 to scrape the meta from a list of urls. I set the connection and read timeouts as per the documentation for MetaInspector and test with httpbin's delay endpoint. MetaInspector's documentation notes I should be able to capture a MetaInspector::Timeout error but instead the Async block errors out uncaptured, even if I capture all errors. Here is my test script called timeout_error.rb:
require 'async'
require 'metainspector'
urls = ["https://httpbin.org/delay/15", "https://www.google.com"]
Async do
urls.each do |url|
Async do
begin
page = MetaInspector.new(url, connection_timeout: 10, read_timeout: 5, allow_non_html_content: true)
rescue MetaInspector::TimeoutError
puts "rescued MetaInspector's TimeoutError"
rescue => e
puts "rescued everything else, type: #{e.class}, message: #{e.message}"
end
puts "made it to the end of the nested Async block with #{url}"
end
end
end
Here is my output from the shell:
meta_collector git:(master) ✗ ruby -v
ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-darwin21]
meta_collector git:(master) ✗ ruby bin/timeout_error.rb
made it to the end of the nested Async block with https://www.google.com
.../.rvm/gems/ruby-3.0.0/gems/async-2.3.0/lib/async/scheduler.rb:213:in `select': execution expired (Timeout::Error)
from .../.rvm/gems/ruby-3.0.0/gems/async-2.3.0/lib/async/scheduler.rb:213:in `run_once'
from .../.rvm/gems/ruby-3.0.0/gems/async-2.3.0/lib/async/scheduler.rb:232:in `run'
from .../.rvm/gems/ruby-3.0.0/gems/async-2.3.0/lib/kernel/async.rb:32:in `Async'
from bin/timeout_error.rb:6:in `<main>'
I expect to see once the "rescued everything else..." alongside two "made it to the end of the nested.." output, but I never do. Anyone have any insights here?
Upvotes: 0
Views: 57