Reputation: 1104
scrapy when trying to open the site returns response.status 505
505 HTTP Version Not Supported
The same site opens normally in the browser. Why might this be? How can this be fixed?
I call scrapy in console by this command line:
scrapy shell 'https://xiaohua.zol.com.cn/detail60/59411.html'
Upvotes: 0
Views: 54
Reputation: 2619
You should use proper headers to extract the data. here is a demo with output
import scrapy
from scrapy.crawler import CrawlerProcess
import json
class Xiaohua(scrapy.Spider):
name = 'xiaohua'
start_urls = 'https://xiaohua.zol.com.cn/detail60/59411.html'
def start_requests(self):
headers = {
'authority': 'xiaohua.zol.com.cn',
'cache-control': 'max-age=0',
'sec-ch-ua': '"Chromium";v="94", "Google Chrome";v="94", ";Not A Brand";v="99"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Linux"',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'sec-fetch-site': 'cross-site',
'sec-fetch-mode': 'navigate',
'sec-fetch-user': '?1',
'sec-fetch-dest': 'document',
'accept-language': 'en-US,en;q=0.9',
'cookie': 'z_pro_city=s_provice%3Dmengjiala%26s_city%3Dnull; userProvinceId=1; userCityId=0; userCountyId=0; userLocationId=1; ip_ck=7sWD7/jzj7QuOTIyODI0LjE2MzQxMTQxNzg%3D; lv=1634114179; vn=1; Hm_lvt_ae5edc2bc4fc71370807f6187f0a2dd0=1634114179; _ga=GA1.3.116086394.1634114186; _gid=GA1.3.2021660129.1634114186; Hm_lpvt_ae5edc2bc4fc71370807f6187f0a2dd0=1634114447; questionnaire_pv=1634083202; z_day=ixgo20%3D1%26icnmo11564%3D1; 22aa20c0da0b6f1d9a3155e8bf4c364e=cq11lgg54n27u10p%7B%7BZ%7D%7D%7B%7BZ%7D%7Dnull; MyZClick_22aa20c0da0b6f1d9a3155e8bf4c364e=/html/body/div%5B5%5D/div/div/div%5B2%5D/p/a/',
}
yield scrapy.Request(url= self.start_urls , callback=self.parse, headers=headers)
def parse(self, response):
print(response.status)
print('*'*10)
print(response.css('h1.article-title::text').get())
print(response.css('ul.nav > li > a::text').getall())
print('*'*10)
process = CrawlerProcess()
process.crawl(Xiaohua)
process.start()
output
200
**********
导演你能认真点儿吗
['笑话首页', '最新笑话', '冷笑话', '搞笑趣图', '搞笑视频', '上传笑话']
**********
Upvotes: 1