Acorn
Acorn

Reputation: 50497

How to manually add URLs to DupeFilter from Spider

I'm currently struggling to find a way to get access to the DupeFilter object from within my Spider.

If I could access it then I could just add another fingerprint to the fingerprints set.

Upvotes: 0

Views: 91

Answers (1)

Acorn
Acorn

Reputation: 50497

So, it looks like you have to dig pretty deep to get to the DupeFilter: self.crawler.engine.slot.scheduler.df

So adding a fingerprint would look like this:

def parse_page(self, response):
    # ...

    dupe_filter = self.crawler.engine.slot.scheduler.df
    dummy_request = Request('http://example.com/thing/9964')
    fingerprint = dupe_filter.request_fingerprint(dummy_request)
    dupe_filter.fingerprints.add(fingerprint)

    # ...

Upvotes: 1

Related Questions