How to fix 416

Question

Hi everyone I want to scrape all the names but when I run my code I'm getting this error:

2019-08-25 23:08:10 [scrapy.core.engine] DEBUG: Crawled (416)


Code:
import scrapy

class project(scrapy.Spider):
    name = 'project'
    start_urls = ['https://www.manta.com/mb_43_A0_02/advertising_and_marketing/alaska']

    def parse(self, response):
        seller_name = response.css('.h4 strong::text').extract()
        yield {'seller name': seller_name}

Serhii · Accepted Answer

By default, scrapy work only with successful responses - if status codes are in the 200-300 range https://docs.scrapy.org/en/latest/topics/spider-middleware.html#module-scrapy.spidermiddlewares.httperror

For work with 416 response use

class MySpider(CrawlSpider):
    handle_httpstatus_list = [416]

And then you can work with this response

if response.status == 416:
    # write code what you need

In your case website protect with distilnetworks and usually such websites not like if scrape them.
I think you need to read rules on this website about scraping, they to allow or not allow it.
Of course, exist different service for bypass (people wrote you in comment) protect but need remember about morality and law.

How to fix 416 <Get Error in scrapy python

Answers (1)

Related Questions

How to fix 416 &lt;Get Error in scrapy python

Answers (1)

Related Questions

How to fix 416 <Get Error in scrapy python