Scrapy-crawled-200 Referer-None

Question

I'm trying to learn how to use scrapy and python but I'm not an expert at all...
I have an empty file after crawling this page :

so.news.com and I don't understand why...

Here is my code :

import scrapy

class XinhuaSpider(scrapy.Spider):
name = 'xinhua'
allowed_domains = ['xinhuanet.com']
start_urls = ['http://so.news.cn/?keyWordAll=&keyWordOne=%E6%96%B0%E5%86%A0+%E8%82%BA%E7%82%8E+%E6%AD%A6%E6%B1%89+%E7%97%85%E6%AF%92&keyWordIg=&searchFields=1&sortField=0&url=&senSearch=1&lang=cn#search/0/%E6%96%B0%E5%86%A0/1/']

def parse(self, response):
    #titles = response.css('#newsCon > div.newsList > div.news > h2 > a::text').extract()
    #date = response.css('#newsCon > div.newsList > div.news> div > p.newstime > span::text').extract()
    titles = response.xpath("/html/body/div[@id='search-result']/div[@class='resultCnt']/div[@id='resultList']/div[@class='newsListCnt secondlist']/div[@id='newsCon']/div[@class='newsList']/div[@class='news']/h2/a/text()").extract()
    date = response.xpath("/html/body/div[@id='search-result']/div[@class='resultCnt']/div[@id='resultList']/div[@class='newsListCnt secondlist']/div[@id='newsCon']/div[@class='newsList']/div[@class='news']/div[@class='easynews']/p[@class='newstime']/span/text()").extract()
    for item in zip(titles,date):
        scraped_info ={
            "title" : item[0],
            "date"  : item[1],                
        } 
        yield scraped_info

    nextPg = response.xpath("/html/body/div[@id='search-result']/div[@class='resultCnt']/div[@id='pagination']/a[@class='next']/@href").extract()
    if nextPg is not None:
        print(nextPg)

This is the messenage in console:

2020-05-11 00:09:36 [scrapy.core.engine] DEBUG: Crawled (200)  (referer: None)
[]

Scrapy-crawled-200 Referer-None

Answers (1)

Related Questions