Thé Generous
Thé Generous

Reputation: 15

Scrapy crawls only one page

This is my code . Spider dont crawl urls or not extract them or something like that . If I but target url in "start urls" Then scrapy finds item, but won't crawl forward and if I but "start urls" the url that contains target list then the result is 0 . :) I hope that text is not confusing

from scrapy.spiders import Spider
from testing.items import TestingItem
import scrapy

class MySpider(scrapy.Spider):
  name            = 'testing'
  allowed_domains = ['http://somewebsite.com']
  start_urls      = ['http://somewebsite.com/listings.php']


  def parse(self, response):
      for href in response.xpath('//h5/a/@href'):
          full_url = response.urljoin(href.extract())
          yield scrapy.Request(full_url, callback=self.parse_item)


  def parse_item(self, response):
    titles = response.xpath('//*[@class="panel-content user-info"]').extract()
    for title in titles:
      item = TestingItem()
      item["nimi"] = response.xpath('//*[@class="seller-info"]/h3/text()').extract()

      yield item

Upvotes: 0

Views: 431

Answers (1)

Adrien Blanquer
Adrien Blanquer

Reputation: 2061

You need to remove the http:// in the allowed_domains.

To answer your comment, for the pagination, you can use Rules, I'll let you check the doc here. It will allow you to go through the pagination easily.

Little exemple:

rules = (Rule(LinkExtractor(allow=(), restrict_xpaths=('xpath/to/nextpage/button',)), callback="parse", follow= True),)

Hope this helps.

Upvotes: 1

Related Questions