Hamza Lachi
Hamza Lachi

Reputation: 1064

How to fix 416 <Get Error in scrapy python

Hi everyone I want to scrape all the names but when I run my code I'm getting this error:

2019-08-25 23:08:10 [scrapy.core.engine] DEBUG: Crawled (416) <GET https://www.manta.com/distil_r_blocked.html?requestId=e243a58b-d46d-4d12-

HTTP status code is not handled or n
ot allowed

Code:

import scrapy

class project(scrapy.Spider):
    name = 'project'
    start_urls = ['https://www.manta.com/mb_43_A0_02/advertising_and_marketing/alaska']

    def parse(self, response):
        seller_name = response.css('.h4 strong::text').extract()
        yield {'seller name': seller_name}

Upvotes: 0

Views: 305

Answers (1)

Serhii
Serhii

Reputation: 1587

By default, scrapy work only with successful responses - if status codes are in the 200-300 range https://docs.scrapy.org/en/latest/topics/spider-middleware.html#module-scrapy.spidermiddlewares.httperror

For work with 416 response use

class MySpider(CrawlSpider):
    handle_httpstatus_list = [416]

And then you can work with this response

if response.status == 416:
    # write code what you need

In your case website protect with distilnetworks and usually such websites not like if scrape them.
I think you need to read rules on this website about scraping, they to allow or not allow it.
Of course, exist different service for bypass (people wrote you in comment) protect but need remember about morality and law.

Upvotes: 1

Related Questions