Reputation: 131
I'm creating a custom filter by inheriting RFPDupeFilter
.
Here is the link from which I'm using the code:
https://github.com/j4s0nh4ck/wiki-spider/blob/master/wiki/wiki/SeenURLFilter.py
Note: I have above code in a custom file named custom_filters.py in the same directory where settings.py resides then in settings.py I have this code.
DUPEFILTER_CLASS = 'myspider.custom_filters.SeenURLFilter'
But when I run the bot, I get this error:
exceptions.TypeError:
__init__()
takes exactly 1 argument (3 given)
Upvotes: 1
Views: 673
Reputation: 474001
As you can see in the traceback from_settings()
method of your filter is called - it then creates an instance of your custom dupe filter. But, since you don't specify your own from_settings()
method the one from built-in RFPDupeFilter
is used:
@classmethod
def from_settings(cls, settings):
debug = settings.getbool('DUPEFILTER_DEBUG')
return cls(job_dir(settings), debug)
which tries to instantiate your custom dupe filter with path
and debug
constructor arguments. And your SeenURLFilter
constructor does not accept debug
argument.
You need to have your dupefilter accepting debug
parameter as well:
from scrapy.dupefilter import RFPDupeFilter
class SeenURLFilter(RFPDupeFilter):
"""A dupe filter that considers the URL"""
def __init__(self, path=None, debug=False): # FIX WAS APPLIED HERE
self.urls_seen = set()
RFPDupeFilter.__init__(self, path, debug) # AND HERE
def request_seen(self, request):
if request.url in self.urls_seen:
return True
else:
self.urls_seen.add(request.url)
Upvotes: 1