DjangoPy
DjangoPy

Reputation: 865

SgmlLinkExtractor and regular expression for match word in a string

I'm using the SgmlLinkExtractor functionality in scrapy to parse specific urls.

I override start_requests function to crawl dynamic url.

this looks like:

start_requests(self): ..... yield Requests(url.strip(), callbackA)

Callback A does nothing right now.

I also implemented process_value for the SgmlLinkExtractor but it never called.

This is the rule I'm using:

rules = [Rule(SgmlLinkExtractor(allow=()), callback=callbackB, follow=True),]

Again callbackB never called.

Upvotes: 0

Views: 415

Answers (1)

Steven Almeroth
Steven Almeroth

Reputation: 8202

If your callbacks are declared in your spider, then they will not have global scope and you need to reference them as scoped to your class with self.:

rules = [
  Rule(SgmlLinkExtractor(), callback=self.callbackB, follow=True),
]

Upvotes: 0

Related Questions