Tobi
Tobi

Reputation: 311

Scrapy doesn't find custom function

I have implemented my own function for excluding urls which contain certain words. However when I call it inside my parse method, Scrapy tells me that the function is not defined, even though it is. I didn't use the rule object since I get the Urls I want to scrape from an api. Here is my setup:

class IbmSpiderSpider(scrapy.Spider):
       ...

   def checkUrlForWords(text): 
        ...
        return flag

   def parse(self, response):
        data = json.loads(response.body)
        results = data.get('resultset').get('searchresults').get('searchresultlist')
        for result in results:
            url = result.get('url')
            if (checkUrlForWords(url)==True): continue
        yield scrapy.Request(url, self.parse_content, meta={'title': result.get('title')})

Please help

Upvotes: 1

Views: 651

Answers (3)

Georgiy
Georgiy

Reputation: 3561

You can also define your function outside from your class in the same .py file:

def checkUrlForWords(text): 
    ...
    return flag

class IbmSpiderSpider(scrapy.Spider):
       ...
   def parse(self, response):
        data = json.loads(response.body)
        results = data.get('resultset').get('searchresults').get('searchresultlist')
        for result in results:
            url = result.get('url')
            if (checkUrlForWords(url)==True): continue
        ....

Upvotes: 1

vezunchik
vezunchik

Reputation: 3717

Use self.checkUrlForWords since this is method inside class. Usage of plain checkUrlForWords will lead to errors. Just add self to method attributes and calling.

def checkUrlForWords(self, text): 
        ...
        return flag

Upvotes: 2

gaFF
gaFF

Reputation: 757

Your function is defined inside your class. Use:

IbmSpiderSpider.checkUrlForWords(url)

Your function looks like a static method, you can use the appropriate decorator to call it with self.checkUrlForWords:

class IbmSpiderSpider(scrapy.Spider):
       ...

   @staticmethod
   def checkUrlForWords(text): 
        ...
        return flag

   def parse(self, response):
        data = json.loads(response.body)
        results = data.get('resultset').get('searchresults').get('searchresultlist')
        for result in results:
            url = result.get('url')
            if (self.checkUrlForWords(url)==True): continue
        yield scrapy.Request(url, self.parse_content, meta={'title': result.get('title')})

Upvotes: 1

Related Questions