Reputation: 1
I am trying to scrape reviews from BestBuy
and it is extracting fine if the code is executed line by line on shell
but not through script
. What is wrong?
class BestbuybotSpider(scrapy.Spider):
name = 'bestbuybot'
allowed_domains = ['https://www.bestbuy.com/site/amazon-echo-dot-3rd-gen-smart-speaker-with-alexa-charcoal/6287974.p?skuId=6287974']
start_urls = ['http://https://www.bestbuy.com/site/amazon-echo-dot-3rd-gen-smart-speaker-with-alexa-charcoal/6287974.p?skuId=6287974/']
def parse(self, response):
#Extracting the content using css selectors
rating = response.css("div.c-ratings-reviews-v2.v-small p::text").extract()
title = response.css(".review-title.c-section-title.heading-5.v-fw-medium ::text").extract()
#Give the extracted content row wise
for item in zip(rating,title):
#create a dictionary to store the scraped info
scraped_info = {
'rating' : item[0],
'title' : item[1],
}
#yield or give the scraped info to scrapy
yield scraped_info
Upvotes: 0
Views: 162
Reputation: 360
There are some problems with your code namely
allowed_domains
should be a domain and not a URL.'http://https:
at the startAs you can see, that the scrapy spider redirects to a finder.cox.net in your image so the spider never reaches the page but is presented with a country selection page which is a redirect.
You should try and first fix your start URL with the exact page location and the spider seems to be working.
Upvotes: 0