Reputation: 1104

scrapy returns response.status 505

scrapy when trying to open the site returns response.status 505

505 HTTP Version Not Supported

The same site opens normally in the browser. Why might this be? How can this be fixed?

I call scrapy in console by this command line:

scrapy shell 'https://xiaohua.zol.com.cn/detail60/59411.html'

Upvotes: 0

Answers (1)

Samsul Islam

Reputation: 2619

You should use proper headers to extract the data. here is a demo with output

import scrapy
from scrapy.crawler import CrawlerProcess
import json

class Xiaohua(scrapy.Spider):
    name = 'xiaohua'
    start_urls = 'https://xiaohua.zol.com.cn/detail60/59411.html'


    def start_requests(self):
        headers = {
        'authority': 'xiaohua.zol.com.cn',
        'cache-control': 'max-age=0',
        'sec-ch-ua': '"Chromium";v="94", "Google Chrome";v="94", ";Not A Brand";v="99"',
        'sec-ch-ua-mobile': '?0',
        'sec-ch-ua-platform': '"Linux"',
        'upgrade-insecure-requests': '1',
        'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36',
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'sec-fetch-site': 'cross-site',
        'sec-fetch-mode': 'navigate',
        'sec-fetch-user': '?1',
        'sec-fetch-dest': 'document',
        'accept-language': 'en-US,en;q=0.9',
        'cookie': 'z_pro_city=s_provice%3Dmengjiala%26s_city%3Dnull; userProvinceId=1; userCityId=0; userCountyId=0; userLocationId=1; ip_ck=7sWD7/jzj7QuOTIyODI0LjE2MzQxMTQxNzg%3D; lv=1634114179; vn=1; Hm_lvt_ae5edc2bc4fc71370807f6187f0a2dd0=1634114179; _ga=GA1.3.116086394.1634114186; _gid=GA1.3.2021660129.1634114186; Hm_lpvt_ae5edc2bc4fc71370807f6187f0a2dd0=1634114447; questionnaire_pv=1634083202; z_day=ixgo20%3D1%26icnmo11564%3D1; 22aa20c0da0b6f1d9a3155e8bf4c364e=cq11lgg54n27u10p%7B%7BZ%7D%7D%7B%7BZ%7D%7Dnull; MyZClick_22aa20c0da0b6f1d9a3155e8bf4c364e=/html/body/div%5B5%5D/div/div/div%5B2%5D/p/a/',
       
            }
        yield scrapy.Request(url= self.start_urls , callback=self.parse, headers=headers)

    def parse(self, response):
        print(response.status)
        print('*'*10)
        print(response.css('h1.article-title::text').get()) 
        print(response.css('ul.nav > li > a::text').getall())   
        print('*'*10)   
process = CrawlerProcess()
process.crawl(Xiaohua)
process.start()

output

200
**********
导演你能认真点儿吗
['笑话首页', '最新笑话', '冷笑话', '搞笑趣图', '搞笑视频', '上传笑话']
**********

Upvotes: 1

scrapy returns response.status 505

Answers (1)

Related Questions