Get value of text (with no tag) in scrapy

Question

I am trying to get the value of text (with no tag like

https://timesofindia.indiatimes.com/us/donald-trump-boris-johnson-talk-5g-and-trade-ahead-of-g7-white-house/articleshow/70504270.cms

So far I have used scrapy shell to get their values using this code

 item=response.xpath("//div[@class='Normal']/text()").extract()

Or

item=response.css('arttextxml *::text').extract()

The problem is that I am getting values when I use these commands in Scrapy Shell but when I use in my scrapy spyder file it return null value

Is there any solution for this problem?

Tony Montana · Accepted Answer

there are multiple problems with your code.

First, it is messy. Second, the CSS selector you are using to get all link to the news articles, giving the same URL more than once. Third, as per your code, in scrapy.Request method calling, you used self.parseNews as a callback method, which is not even available in the whole file.

I have fixed your code on some level and right now, I am not facing any issue with it.

# -*- coding: utf-8 -*-
import scrapy


class TimesofindiaSpider(scrapy.Spider):
    name = 'timesofindia'
    allowed_domains = ["timesofindia.indiatimes.com"]
    start_urls = ["https://timesofindia.indiatimes.com/World"]
    base_url = "https://timesofindia.indiatimes.com/"

    def parse(self, response):        
        for urls in response.css('div.top-newslist > ul > li'):
            url = urls.css('a::attr(href)').extract_first()
            yield scrapy.Request(self.base_url + url, callback = self.parse_save)

    def parse_save(self, response):
        print(response.xpath("//div[@class='Normal']/text()").extract())

Get value of text (with no tag) in scrapy

Answers (2)

Scraper

Related Questions