Reputation: 11108
Is there a way in Python 3.3.0 to set the timeout of the robotparser.read() function? (such as in the urllib.request urlopen)
The default timeout of 60 seconds is a bit drastic.
(I'm self-teaching myself into Python.)
Upvotes: 3
Views: 829
Reputation: 1124000
No, you'd have to either set the global default timout with socket.setdefaulttimeout()
, or subclass the RobotFileParser
class to add a custom timeout:
from urllib.robotparser import RobotFileParser
import urllib.request
class TimoutRobotFileParser(RobotFileParser):
def __init__(self, url='', timeout=60):
super().__init__(url)
self.timeout = timeout
def read(self):
"""Reads the robots.txt URL and feeds it to the parser."""
try:
f = urllib.request.urlopen(self.url, timeout=self.timeout)
except urllib.error.HTTPError as err:
if err.code in (401, 403):
self.disallow_all = True
elif err.code >= 400:
self.allow_all = True
else:
raw = f.read()
self.parse(raw.decode("utf-8").splitlines())
Upvotes: 6