Conditional URL scraping with Scrapy

Question

I am trying to use Scrapy on a site which I do not know the URL structure of.

I would like to:

only extract data from pages which contain Xpath "//div[@class="product-view"]".
extract print (in CSV) the URL, the name and price Xpaths

When I run the below script, all I get is a random list of URL's

scrapy crawl dmoz>test.txt

from scrapy.selector import HtmlXPathSelector
from scrapy.spider import BaseSpider
from scrapy.http import Request

DOMAIN = 'site.com'
URL = 'http://%s' % DOMAIN

class MySpider(BaseSpider):
    name = "dmoz"
    allowed_domains = [DOMAIN]
    start_urls = [
        URL
    ]

    def parse(self, response):
        for url in response.xpath('//a/@href').extract():
            if not ( url.startswith('http://') or url.startswith('https://') ):
                url= URL + url
            if response.xpath('//div[@class="product-view"]'):
                url = response.extract()
                name = response.xpath('//div[@class="product-name"]/h1/text()').extract()
                price = response.xpath('//span[@class="product_price_details"]/text()').extract()
            yield Request(url, callback=self.parse)
            print url

Conditional URL scraping with Scrapy

Answers (1)

Related Questions