Reputation: 2686
I have existing scrapy
code, but am having trouble formulating NEXT_PAGE_SELECTOR
that will select the element via css select in scrapy
:
def parse(self, response):
'''
get the first page of results.
'''
SET_SELECTOR = 'b_algo'
for bresult in response.css(SET_SELECTOR):
NAME_SELECTOR = 'h2 a ::text'
yield {
'name': bresult.css(NAME_SELECTOR).extract_first(),
}
'''
get the further pages of results.
'''
#<<NEXT_PAGE_SELECTOR here>>
The html Im trying to match is:
<ul class="sb_pagF" aria-label="More pages with results">
<li>
<a title="Next page" class="sb_pagN" href="/search?q=site%3asite.com&first=11&FORM=PORE">
<div class="sw_next">Next
</div>
</a>
</li>
</ul>
I've formulated the following to match this:
NEXT_PAGE_SELECTOR = '.sb_pagF li a ::attr(href)'
Does this look right to grab the href
?
Thanks!
Upvotes: 0
Views: 322
Reputation: 21436
Yes it is correct:
$ scrapy shell
In[1]: foo = """<ul class="sb_pagF" aria-label="More pages with results">
<li>
<a title="Next page" class="sb_pagN" href="/search?q=site%3asite.com&first=11&FORM=PORE">
<div class="sw_next">Next
</div>
</a>
</li>
</ul>"""
In [2]: from scrapy import Selector
In [3]: sel = Selector(text=foo)
In [4]: sel.css('.sb_pagF li a ::attr(href)').extract()
Out[1]: [u'/search?q=site%3asite.com&first=11&FORM=PORE']
Upvotes: 3
Reputation: 473893
You can always test your selectors in the Scrapy Shell pointing it to your local html:
$ cat index.html
<ul class="sb_pagF" aria-label="More pages with results">
<li>
<a title="Next page" class="sb_pagN" href="/search?q=site%3asite.com&first=11&FORM=PORE">
<div class="sw_next">Next
</div>
</a>
</li>
</ul>
$ scrapy shell file://$PWD/index.html
In [1]: response.css(".sb_pagF li a ::attr(href)").extract_first()
Out[1]: u'/search?q=site%3asite.com&first=11&FORM=PORE'
Upvotes: 3