Bee Smears
Bee Smears

Reputation: 803

Extracting Text/Parameter *within* a Tag

I have the following source code from which I am attempting to extract my desired information:

<div id="PaginationBottom" class="pagination">
    <a href="#" data-page="2" title="page 2 of 31" >2</a>
    <a href="#" data-page="3" title="page 3 of 31" >3</a>
    <a href="#" data-page="4" title="page 4 of 31" >4</a>
    <a href="#" data-page="10" title="page 10 of 31" >10</a>
    <a href="#" data-page="2" title="page 2 of 31" class="next" >next &raquo;</a>
</div>

What I want to extract is the title="page 2 of 31" information from within the final tag itself. I can get the tag with the following code:

response.xpath('//div[@id="PaginationBottom"]//a[@class="next"]').extract()

Thus, what I'd like to know is whether it is possible to extract a parameter's text from within the tag itself. Is it? I can't find information on this anywhere, but I'm brand new to xpath and don't know the best search terms. Thanks for any help!

Upvotes: 0

Views: 60

Answers (2)

Prince Bhatti
Prince Bhatti

Reputation: 5031

Try a simple one like this:(htmltext is the text you want to parse)

regex1 =  '<a href="#" data-page="2"(.+?)>2</a>'
pattern1 = re.compile(regex1)
Extracted_Text = re.findall(pattern1,htmltext)
print Extracted_Text

This code extracts everything between <a href="#" data-page="2" and >2</a> Output would be like: title="page 2 of 31" and so...

Upvotes: 0

alecxe
alecxe

Reputation: 473863

Add /@title to the end of your xpath expression:

//div[@id="PaginationBottom"]//a[@class="next"]/@title

Demo from the scrapy shell:

>>> response.xpath('//div[@id="PaginationBottom"]//a[@class="next"]/@title').extract()
[u'page 2 of 31']

Just a follow up. You would probably want to get the maximum number of pages from the title attribute value, 31 out of the page 2 of 31. Scrapy Selector's built-in re() method would be helpful here:

>>> response.xpath('//div[@id="PaginationBottom"]/a[@class="next"]/@title').re('page \d+ of (\d+)')
[u'31']

Upvotes: 2

Related Questions