Reputation: 21
When I run this code in the terminal, it only goes through the first page. It doesn't go through any other links from the start URL. I'm not good with regular expressions so would that be the case? I was following a tutorial on YouTube which is almost identical to my code and that worked perfectly. So I'm not sure what the issue is for this one.
from scrapy.selector import Selector
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from ScrapBooks.items import ScrapbooksItem
class AlibrisspiderSpider(CrawlSpider):
name = "as"
allowed_domains = ["alibris.com"]
start_urls = ["https://www.alibris.com/search/books/subject/mystery/"]
rules = ( Rule(SgmlLinkExtractor(allow = "www\.alibris\.com.*"),
callback = "parse_item", follow = True), )
def parse_item(self, response):
sel = Selector(response)
item = ScrapbooksItem()
item['URL'] = response.request.url
item['bookLink'] = sel.xpath('//*[@id="selected-works"]/ul/li/a').extract()
self.log("********* Inside Parse Method ********")
return item
Below is my items.py class
import scrapy
from scrapy.item import Item, Field
class ScrapbooksItem(Item):
# define the fields for your item here like:
# name = scrapy.Field()
URL = Field()
bookLink = Field()
Upvotes: 0
Views: 63
Reputation: 5461
Dont return the item yield it,
Use yield instead o of returne at the end of parse_item
Upvotes: 1