user2747776
user2747776

Reputation: 109

not able to scrape product url using scrapy in python

I want to extract all the product url from the link "http://www.shopclues.com/diwali-mega-mall/hot-electronics-sale-fs/audio-systems-fs.html" using scrapy in python. Below is the function I'm using to do this:

def parse(self, response):
        print("hello");

        hxs = HtmlXPathSelector(response)
        sites = hxs.select('//div[@id="pagination_contents"]')
        items = []
        i=3
    for site in sites:
            item = DmozItem()
            item['link'] = site.select('div[2]/div['+str(i)+']/a/@href').extract()
            i=int(i)+1;
            print i
            items.append(item)
    return items

The x-path of each product div is: //div[@id="pagination_contents"]/div[2]/div['+str(i)+']/a/@href

But I'm getting only one link and not all the products' url.

Upvotes: 0

Views: 413

Answers (1)

Birei
Birei

Reputation: 36282

I think your problem is that hxs.select('//div[@id="pagination_contents"]') only returns one result and then you only do one iteration in the loop.

You can select all following <div> elements that contain an <a>, and loop over those:

sites = hxs.select('//div[@id="pagination_contents"]/div[2]/div[a]')
for site in sites:
    ## This loop will run 33 times in my test.
    ## Access to each link:
    item['link'] = site.select('./a[2]/@href').extract()

Upvotes: 1

Related Questions