Reputation: 311
I have implemented my own function for excluding urls which contain certain words. However when I call it inside my parse method, Scrapy tells me that the function is not defined, even though it is. I didn't use the rule object since I get the Urls I want to scrape from an api. Here is my setup:
class IbmSpiderSpider(scrapy.Spider):
...
def checkUrlForWords(text):
...
return flag
def parse(self, response):
data = json.loads(response.body)
results = data.get('resultset').get('searchresults').get('searchresultlist')
for result in results:
url = result.get('url')
if (checkUrlForWords(url)==True): continue
yield scrapy.Request(url, self.parse_content, meta={'title': result.get('title')})
Please help
Upvotes: 1
Views: 651
Reputation: 3561
You can also define your function outside from your class in the same .py
file:
def checkUrlForWords(text):
...
return flag
class IbmSpiderSpider(scrapy.Spider):
...
def parse(self, response):
data = json.loads(response.body)
results = data.get('resultset').get('searchresults').get('searchresultlist')
for result in results:
url = result.get('url')
if (checkUrlForWords(url)==True): continue
....
Upvotes: 1
Reputation: 3717
Use self.checkUrlForWords
since this is method inside class. Usage of plain checkUrlForWords
will lead to errors. Just add self
to method attributes and calling.
def checkUrlForWords(self, text):
...
return flag
Upvotes: 2
Reputation: 757
Your function is defined inside your class. Use:
IbmSpiderSpider.checkUrlForWords(url)
Your function looks like a static method, you can use the appropriate decorator to call it with self.checkUrlForWords
:
class IbmSpiderSpider(scrapy.Spider):
...
@staticmethod
def checkUrlForWords(text):
...
return flag
def parse(self, response):
data = json.loads(response.body)
results = data.get('resultset').get('searchresults').get('searchresultlist')
for result in results:
url = result.get('url')
if (self.checkUrlForWords(url)==True): continue
yield scrapy.Request(url, self.parse_content, meta={'title': result.get('title')})
Upvotes: 1