Reputation: 656
Don't understand how to paginate for kimono scraping without next> in navigation i.e. for paging structure:
<div class="pages" style="clear: both;">
<span>1</span>
<a href="/page=2">2</a>
<a href="/page=3">3</a>
<a href="/page=4">4</a>
</div>
xpath for css selector gives results only for page2:
div.pages > a
I want to have one API (i.e. don't want to generate URL list with additional API)
Upvotes: 1
Views: 1232
Reputation: 1056
Below you will find XPath and CSS selector to select all a
elements meant for paging:
XPath: //descendant::*[1]/a[contains(@href, 'page=')]
CSS selector: div[id=results] div[class~=pull-right] a
div[class~=pull-right]
means you want to select all divs
that which class
attribute equals to pull-right
.
I don't quite like CSS selector, but Kimono does not allow a[href]
type of selection for some reason. Ideally you would use something like this:
div[id=results] a[href=~page]
Upvotes: 0
Reputation: 1226
You have two options.
(a) Try div.pages > span + a
. This 'next page' selector will always select the 'next' page and will stop on the last page. The example markup shows that the currently selected page is a span
and the next page link is an adjacent a
. You can use the adjacent sibling selector +
to select an a
that comes after a span
. Note: You didn't a link to the target site, so it's not guaranteed this will work, but based on your example markup, it would.
(b) Simple manually enter a list of URLs for this API to crawl. It looks like the list you'd want is:
http://www.thissiteurl.com/page=1
http://www.thissiteurl.com/page=2
http://www.thissiteurl.com/page=3
...
Upvotes: 1