Reputation: 109
I want to extract all the product url from the link "http://www.shopclues.com/diwali-mega-mall/hot-electronics-sale-fs/audio-systems-fs.html" using scrapy in python. Below is the function I'm using to do this:
def parse(self, response):
print("hello");
hxs = HtmlXPathSelector(response)
sites = hxs.select('//div[@id="pagination_contents"]')
items = []
i=3
for site in sites:
item = DmozItem()
item['link'] = site.select('div[2]/div['+str(i)+']/a/@href').extract()
i=int(i)+1;
print i
items.append(item)
return items
The x-path of each product div is: //div[@id="pagination_contents"]/div[2]/div['+str(i)+']/a/@href
But I'm getting only one link and not all the products' url.
Upvotes: 0
Views: 413
Reputation: 36282
I think your problem is that hxs.select('//div[@id="pagination_contents"]')
only returns one result and then you only do one iteration in the loop.
You can select all following <div>
elements that contain an <a>
, and loop over those:
sites = hxs.select('//div[@id="pagination_contents"]/div[2]/div[a]')
for site in sites:
## This loop will run 33 times in my test.
## Access to each link:
item['link'] = site.select('./a[2]/@href').extract()
Upvotes: 1