Vikas Dhiman
Vikas Dhiman

Reputation: 57

Scrapy not returning any scraped item

I have just started using Scrapy for Web Scraping. I have read few documents which points to html pages for scraping. I have tried it in eentertainment website, I was trying to scrape only title of the Image. later on Price and image. On writing i am not able to get anything. Can anyone please point where i am doing wrong.

Here is the code.

# -*- coding: utf-8 -*-
import scrapy


class EeentertainmentSpider(scrapy.Spider):
    name = 'eeentertainment'
    allowed_domains = ['www.entertainmentearth.com/exclusives.asp']
    start_urls = ['http://www.entertainmentearth.com/exclusives.asp/']

    
    def parse(self, response):
        #Extracting the content using css selectors
        titles = response.css('.title::text').extract()
        
       
        #Give the extracted content row wise
        for item in zip(titles):
            #create a dictionary to store the scraped info
            scraped_info = {
                'title' : item[0],
                
            }

            #yield or give the scraped info to scrapy
            yield scraped_info
        pass

and here is the webpage inspect element:-enter image description here

Upvotes: 0

Views: 160

Answers (1)

Tomáš Linhart
Tomáš Linhart

Reputation: 10210

There are couple of problems with your spider:

  • allowed_domains list should contain just domain names, not exact URLs (see the documentation)
  • the URL in start_urls has a trailing / (it should read http://www.entertainmentearth.com/exclusives.asp)
  • I'm not sure what you are trying to do with the zip here, but I'm almost sure it's not intended
  • pass at the end of parse method is superfluous

From what I can tell based on the screenshot provided, you are trying to scrape image titles from the page. For that, and taking into account notes above, see adapted spider code that works:

# -*- coding: utf-8 -*-
import scrapy

class EeentertainmentSpider(scrapy.Spider):
    name = 'eeentertainment'
    allowed_domains = ['entertainmentearth.com']
    start_urls = ['http://www.entertainmentearth.com/exclusives.asp']

    def parse(self, response):
        titles = response.css('img::attr(title)').extract()
        for title in titles:
            scraped_info = {
                'title' : title,
            }
            yield scraped_info

Upvotes: 1

Related Questions