Reputation: 1064
Hi everyone I want to scrape all the names but when I run my code I'm getting this error:
2019-08-25 23:08:10 [scrapy.core.engine] DEBUG: Crawled (416) <GET https://www.manta.com/distil_r_blocked.html?requestId=e243a58b-d46d-4d12-
HTTP status code is not handled or n
ot allowed
Code:
import scrapy
class project(scrapy.Spider):
name = 'project'
start_urls = ['https://www.manta.com/mb_43_A0_02/advertising_and_marketing/alaska']
def parse(self, response):
seller_name = response.css('.h4 strong::text').extract()
yield {'seller name': seller_name}
Upvotes: 0
Views: 305
Reputation: 1587
By default, scrapy work only with successful responses - if status codes are in the 200-300 range https://docs.scrapy.org/en/latest/topics/spider-middleware.html#module-scrapy.spidermiddlewares.httperror
For work with 416 response use
class MySpider(CrawlSpider):
handle_httpstatus_list = [416]
And then you can work with this response
if response.status == 416:
# write code what you need
In your case website protect with distilnetworks and usually such websites not like if scrape them.
I think you need to read rules on this website about scraping, they to allow or not allow it.
Of course, exist different service for bypass (people wrote you in comment) protect but need remember about morality and law.
Upvotes: 1