Reputation: 433
Spider code:
import scrapy
from crawler.items import Item
class DmozSpider(scrapy.Spider):
name = 'blabla'
allowed_domains = ['blabla']
def start_requests(self):
yield scrapy.Request('http://blabla.org/forum/viewforum.php?f=123', self.parse)
def parse(self, response):
item = Item()
item['Title'] = response.xpath('//a[@class="title"/text()').extract()
yield item
next_page = response.xpath('//a[text()="Next"]/@href')
if next_page:
url = response.urljoin(next_page[0].extract())
yield scrapy.Request(url, callback=self.parse)
Problem: spider stops after the first page even though next page_page and url exist and are correct.
Here is the last debug message before stop:
[scrapy] DEBUG: Crawled (200) <GET http://blabla.org/forum/viewforum.php?f=123&start=50> (referer: http://blabla.org/forum/viewforum.php?f=123)
[scrapy] INFO: Closing spider (finished)
Upvotes: 0
Views: 99
Reputation: 433
The problem was that the response from the next page was a response for robots and did not contain any links.
Upvotes: 0
Reputation: 829
You need to check following this.
Upvotes: 1