Sneha L
Sneha L

Reputation: 1

Scrapy crawl not extracting data

I am trying to scrape reviews from BestBuy and it is extracting fine if the code is executed line by line on shell but not through script. What is wrong?

class BestbuybotSpider(scrapy.Spider):
    name = 'bestbuybot'
    allowed_domains = ['https://www.bestbuy.com/site/amazon-echo-dot-3rd-gen-smart-speaker-with-alexa-charcoal/6287974.p?skuId=6287974']
    start_urls = ['http://https://www.bestbuy.com/site/amazon-echo-dot-3rd-gen-smart-speaker-with-alexa-charcoal/6287974.p?skuId=6287974/']


def parse(self, response):
        #Extracting the content using css selectors
        rating = response.css("div.c-ratings-reviews-v2.v-small p::text").extract()
        title = response.css(".review-title.c-section-title.heading-5.v-fw-medium  ::text").extract()

        #Give the extracted content row wise
        for item in zip(rating,title):
            #create a dictionary to store the scraped info
            scraped_info = {
                'rating' : item[0],
                'title' : item[1],
            }

            #yield or give the scraped info to scrapy
            yield scraped_info

Console Image

Upvotes: 0

Views: 162

Answers (1)

Sam
Sam

Reputation: 360

There are some problems with your code namely

  1. allowed_domains should be a domain and not a URL.
  2. Your start URL has a problem with URL scheme namely it has 'http://https: at the start

As you can see, that the scrapy spider redirects to a finder.cox.net in your image so the spider never reaches the page but is presented with a country selection page which is a redirect.

You should try and first fix your start URL with the exact page location and the spider seems to be working.

Upvotes: 0

Related Questions