Reputation: 313
I'm new to Scrapy, and I've just started looking into XPath.
I'm trying to extract titles and links from html list items within a div. The following code is how I thought I'd go about doing it, (selecting the ul div, by id, then looping through the list items):
def parse(self, response):
for t in response.xpath('//*[@id="categories"]/ul'):
for x in t.xpath('//li'):
item = TgmItem()
item['title'] = x.xpath('a/text()').extract()
item['link'] = x.xpath('a/@href').extract()
yield item
But I received the same results as this attempt:
def parse(self, response):
for x in response.xpath('//li'):
item = TgmItem()
item['title'] = x.xpath('a/text()').extract()
item['link'] = x.xpath('a/@href').extract()
yield item
Where the exported csv file contains li data from source code top to bottom...
I'm not an expert and I've made a number of attempts, if anyone could shed some light on this it would be appreciated.
Upvotes: 6
Views: 8137
Reputation: 473863
You need to start your xpath expression used inside the inner loop with a dot:
for t in response.xpath('//*[@id="categories"]/ul'):
for x in t.xpath('.//li'):
This would make it search in the scope of current element, not the whole page.
See more explanation at Working with relative XPaths.
Upvotes: 9