Reputation: 57
I have just started using Scrapy for Web Scraping. I have read few documents which points to html pages for scraping. I have tried it in eentertainment website, I was trying to scrape only title of the Image. later on Price and image. On writing i am not able to get anything. Can anyone please point where i am doing wrong.
Here is the code.
# -*- coding: utf-8 -*-
import scrapy
class EeentertainmentSpider(scrapy.Spider):
name = 'eeentertainment'
allowed_domains = ['www.entertainmentearth.com/exclusives.asp']
start_urls = ['http://www.entertainmentearth.com/exclusives.asp/']
def parse(self, response):
#Extracting the content using css selectors
titles = response.css('.title::text').extract()
#Give the extracted content row wise
for item in zip(titles):
#create a dictionary to store the scraped info
scraped_info = {
'title' : item[0],
}
#yield or give the scraped info to scrapy
yield scraped_info
pass
and here is the webpage inspect element:-
Upvotes: 0
Views: 160
Reputation: 10210
There are couple of problems with your spider:
allowed_domains
list should contain just domain names, not exact URLs (see the documentation)start_urls
has a trailing /
(it should read http://www.entertainmentearth.com/exclusives.asp
)zip
here, but I'm almost sure it's not intendedpass
at the end of parse
method is superfluousFrom what I can tell based on the screenshot provided, you are trying to scrape image titles from the page. For that, and taking into account notes above, see adapted spider code that works:
# -*- coding: utf-8 -*-
import scrapy
class EeentertainmentSpider(scrapy.Spider):
name = 'eeentertainment'
allowed_domains = ['entertainmentearth.com']
start_urls = ['http://www.entertainmentearth.com/exclusives.asp']
def parse(self, response):
titles = response.css('img::attr(title)').extract()
for title in titles:
scraped_info = {
'title' : title,
}
yield scraped_info
Upvotes: 1