Edwin Solis
Edwin Solis

Reputation: 21

Scrapy python code is not recursively going through links

When I run this code in the terminal, it only goes through the first page. It doesn't go through any other links from the start URL. I'm not good with regular expressions so would that be the case? I was following a tutorial on YouTube which is almost identical to my code and that worked perfectly. So I'm not sure what the issue is for this one.

from scrapy.selector import Selector
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from ScrapBooks.items import ScrapbooksItem

class AlibrisspiderSpider(CrawlSpider):

    name = "as"
    allowed_domains = ["alibris.com"]

    start_urls = ["https://www.alibris.com/search/books/subject/mystery/"]

    rules = ( Rule(SgmlLinkExtractor(allow = "www\.alibris\.com.*"), 
    callback = "parse_item", follow = True), )


    def parse_item(self, response):
        sel = Selector(response)
        item = ScrapbooksItem()
        item['URL'] = response.request.url
        item['bookLink'] = sel.xpath('//*[@id="selected-works"]/ul/li/a').extract()
        self.log("********* Inside Parse Method ********")
        return item

Below is my items.py class


import scrapy
from scrapy.item import Item, Field

class ScrapbooksItem(Item):
    # define the fields for your item here like:
    # name = scrapy.Field()

    URL = Field()
    bookLink = Field()

Upvotes: 0

Views: 63

Answers (1)

Yash Pokar
Yash Pokar

Reputation: 5461

Dont return the item yield it,

Use yield instead o of returne at the end of parse_item

Upvotes: 1

Related Questions