Terence
Terence

Reputation: 11108

Python Robotparser Timeout equivalent

Is there a way in Python 3.3.0 to set the timeout of the robotparser.read() function? (such as in the urllib.request urlopen)

The default timeout of 60 seconds is a bit drastic.

(I'm self-teaching myself into Python.)

Python 3.3.0 - robotparser

Python 3.3.0 - urllib.request

Upvotes: 3

Views: 829

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1124000

No, you'd have to either set the global default timout with socket.setdefaulttimeout(), or subclass the RobotFileParser class to add a custom timeout:

from urllib.robotparser import RobotFileParser
import urllib.request

class TimoutRobotFileParser(RobotFileParser):
    def __init__(self, url='', timeout=60):
        super().__init__(url)
        self.timeout = timeout

    def read(self):
        """Reads the robots.txt URL and feeds it to the parser."""
        try:
            f = urllib.request.urlopen(self.url, timeout=self.timeout)
        except urllib.error.HTTPError as err:
            if err.code in (401, 403):
                self.disallow_all = True
            elif err.code >= 400:
                self.allow_all = True
        else:
            raw = f.read()
            self.parse(raw.decode("utf-8").splitlines())

Upvotes: 6

Related Questions