AndriuZ
AndriuZ

Reputation: 656

Kimono XPath for pagination without next>

Don't understand how to paginate for kimono scraping without next> in navigation i.e. for paging structure:

<div class="pages" style="clear: both;">
    <span>1</span>    
    <a href="/page=2">2</a>
    <a href="/page=3">3</a>
    <a href="/page=4">4</a>
</div>

xpath for css selector gives results only for page2:

div.pages > a

I want to have one API (i.e. don't want to generate URL list with additional API)

Upvotes: 1

Views: 1232

Answers (2)

Gabrielius
Gabrielius

Reputation: 1056

Below you will find XPath and CSS selector to select all a elements meant for paging:

  • XPath: //descendant::*[1]/a[contains(@href, 'page=')]

  • CSS selector: div[id=results] div[class~=pull-right] a

div[class~=pull-right] means you want to select all divs that which class attribute equals to pull-right.

I don't quite like CSS selector, but Kimono does not allow a[href] type of selection for some reason. Ideally you would use something like this:

  • Better CSS selector: div[id=results] a[href=~page]

Upvotes: 0

trip41
trip41

Reputation: 1226

You have two options.

(a) Try div.pages > span + a. This 'next page' selector will always select the 'next' page and will stop on the last page. The example markup shows that the currently selected page is a span and the next page link is an adjacent a. You can use the adjacent sibling selector + to select an a that comes after a span. Note: You didn't a link to the target site, so it's not guaranteed this will work, but based on your example markup, it would.

(b) Simple manually enter a list of URLs for this API to crawl. It looks like the list you'd want is:

http://www.thissiteurl.com/page=1
http://www.thissiteurl.com/page=2
http://www.thissiteurl.com/page=3
...

Upvotes: 1

Related Questions