Kurt Peek
Kurt Peek

Reputation: 57641

How to fix a circular import when inheriting from Scrapy's RetryMiddleware class?

I'm trying to adapt Scrapy's RetryMiddleware class, overriding the _retry method with a copy-pasted version in which I just add one additional line. I tried starting my custom middleware module as follows:

import scrapy.downloadermiddlewares.retry
from scrapy.utils.python import global_object_name

However, this gives rise to an

ImportError: cannot import name global_object_name

According to ImportError: Cannot import name X, this type of error is caused by circular imports, but in this case I cannot easily remove dependencies in Scrapy's source code. How can I fix this?

For the sake of completeness, here is the TorRetryMiddleware I'm trying to implement:

import logging
import scrapy.downloadermiddlewares.retry
from scrapy.utils.python import global_object_name
import apkmirror_scraper.tor_controller as tor_controller

logger = logging.getLogger(__name__)

class TorRetryMiddleware(scrapy.downloadermiddlewares.retry.RetryMiddleware):
    def __init__(self, settings):
        super(TorRetryMiddleware, self).__init__(settings)
        self.retry_http_codes = {403, 429}                  # Retry on 403 ('Forbidden') and 429 ('Too Many Requests')

    def _retry(self, request, reason, spider):
        '''Same as original '_retry' method, but with a call to 'change_identity' before returning the Request.'''
        retries = request.meta.get('retry_times', 0) + 1

        stats = spider.crawler.stats
        if retries <= self.max_retry_times:
            logger.debug("Retrying %(request)s (failed %(retries)d times): %(reason)s",
                         {'request': request, 'retries': retries, 'reason': reason},
                         extra={'spider': spider})
            retryreq = request.copy()
            retryreq.meta['retry_times'] = retries
            retryreq.dont_filter = True
            retryreq.priority = request.priority + self.priority_adjust

            if isinstance(reason, Exception):
                reason = global_object_name(reason.__class__)

            stats.inc_value('retry/count')
            stats.inc_value('retry/reason_count/%s' % reason)

            tor_controller.change_identity()    # This line is added to the original '_retry' method      

            return retryreq
        else:
            stats.inc_value('retry/max_reached')
            logger.debug("Gave up retrying %(request)s (failed %(retries)d times): %(reason)s",
                         {'request': request, 'retries': retries, 'reason': reason},
                         extra={'spider': spider})

Upvotes: 1

Views: 282

Answers (1)

starrify
starrify

Reputation: 14751

I personally don't think this ImportError comes from circular imports. Instead, it's highly likely that your version of Scrapy does not yet contain scrapy.utils.python.global_object_name.

scrapy.utils.python.global_object_name didn't come until this commit, which does not yet belong to any existing releases (the latest release is v1.3.3) (it's targeting to version v1.4, though).

Please verify that you are using the code from GitHub and your code does contain that very commit.

EDITED:

Regarding:

According to ImportError: Cannot import name X, this type of error is caused by circular imports,

There are many reasons that may cause an ImportError. Usually the stack trace would be sufficient to determine the root cause. E.g.

>>> import no_such_name
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named no_such_name

While a circular import shall have quite different stack trace, e.g.

[pengyu@GLaDOS-Precision-7510 tmp]$ cat foo.py 
from bar import baz
baz = 1
[pengyu@GLaDOS-Precision-7510 tmp]$ cat bar.py 
from foo import baz
baz = 2
[pengyu@GLaDOS-Precision-7510 tmp]$ python -c "import foo"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/foo.py", line 1, in <module>
    from bar import baz
  File "/tmp/bar.py", line 1, in <module>
    from foo import baz
ImportError: cannot import name 'baz'

Upvotes: 4

Related Questions