Reputation: 1
I am using scrapy to get some information from all pages of a website. Here is my dmoz_spider.py file.when i execute this i get IndentationError. Please help me out.
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from scrapy.item import Item, Field
import string
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
class EypItem(Item):
title = Field()
link = Field()
price = Field()
review = Field()
class eypSpider(CrawlSpider):
name = "dmoz"
allowed_domains =["http://www.walgreens.com"]
start_urls =["http://www.walgreens.com/search/results.jsp?Ntt=allergy%20medicine"]
rules = (Rule(SgmlLinkExtractor(allow=('/search/results\.jsp', )), callback='parse_item', follow= True),)
def parse_item(self, response):
self.log('Hi, this is an item page! %s' % response.url)
hxs = HtmlXPathSelector(response)
sites = hxs.select('//div[@id="productGrid"]')
items = []
for site in sites:
itemE = EypItem()
itemE["title"] = site.select('//*[@class="image-container"]/a/img/@alt').extract()
itemE["link"] = site.select('//*[@class="image-container"]/a/img/@src').extract()
itemE["price"] = site.select('//*[@class="pricing"]/div/p/text()').extract()
itemE["review"] = site.select('//*[@class="reviewSnippet"]/div/div/span/text()').extract()
items.append(itemE)
return items
Upvotes: 0
Views: 1591
Reputation: 7889
Aside from the indentation error, your allowed_domains
has been specified incorrectly. Change it as follows (which is to say, remove the "http://" prefix from the URL):
allowed_domains =["www.walgreens.com"]
Upvotes: 1