Extracting Text/Parameter *within* a Tag

Question

I have the following source code from which I am attempting to extract my desired information:

What I want to extract is the title="page 2 of 31" information from within the final tag itself. I can get the tag with the following code:

response.xpath('//div[@id="PaginationBottom"]//a[@class="next"]').extract()

Thus, what I'd like to know is whether it is possible to extract a parameter's text from within the tag itself. Is it? I can't find information on this anywhere, but I'm brand new to xpath and don't know the best search terms. Thanks for any help!

alecxe · Accepted Answer

Add /@title to the end of your xpath expression:

//div[@id="PaginationBottom"]//a[@class="next"]/@title

Demo from the scrapy shell:

>>> response.xpath('//div[@id="PaginationBottom"]//a[@class="next"]/@title').extract()
[u'page 2 of 31']

Just a follow up. You would probably want to get the maximum number of pages from the title attribute value, 31 out of the page 2 of 31. Scrapy Selector's built-in re() method would be helpful here:

>>> response.xpath('//div[@id="PaginationBottom"]/a[@class="next"]/@title').re('page \d+ of (\d+)')
[u'31']

Extracting Text/Parameter within a Tag

Answers (2)

Related Questions