Scrapy - Not Iterating Through

Question

I am brand new to scrapy and am experimenting based on the docs / tutorial. I am writing a simple bot to just scrape hacker news and ultimately want to extract stories with only a certain amount of points. I have come to a point where my loop just fills the same story title / link for all results on pages 1 and 2. How do I actually get it to check every single story instead of just the first ones on each page? The code is as follows:

import scrapy

class ArticlesSpider(scrapy.Spider):
    name = 'articles'
    start_urls = [
        'https://news.ycombinator.com',
        'https://news.ycombinator.com/news?p=2'
    ]

    def parse(self, response):
        link = response.css('tr.athing')
        for website in link:
            yield {
                'title': link.css('tr.athing td.title a.storylink::text').get(),
                'link':  link.css('tr.athing td.title a::attr(href)').get()
            }

The output in my console is the title and link in dict form but the same exact one (30 times) per page. What am I doing wrong?

Georgiy · Accepted Answer

inside cycle you need to use website.css.. not link.css... : it should like

    def parse(self, response):
        link = response.css('tr.athing')
        for website in link:
            yield {
                'title': website.css('td.title a.storylink::text').get(),
                'link':  website.css('td.title a::attr(href)').get()
            }

Scrapy - Not Iterating Through

Answers (1)

Related Questions