Match html output for result scrapy (skip first match)

Question

I have existing scrapy code, but am having trouble formulating NEXT_PAGE_SELECTOR that will select the element via css select in scrapy:

def parse(self, response):
'''
        get the first page of results.
    '''
    SET_SELECTOR = 'b_algo'
    for bresult in response.css(SET_SELECTOR):
        NAME_SELECTOR = 'h2 a ::text'
        yield {
            'name': bresult.css(NAME_SELECTOR).extract_first(),
        }

    '''
        get the further pages of results.
    '''
    #<>

The html Im trying to match is:



          
            Next

I've formulated the following to match this:

NEXT_PAGE_SELECTOR = '.sb_pagF li a ::attr(href)'

Does this look right to grab the href?

Thanks!

Granitosaurus · Accepted Answer

Yes it is correct:

$ scrapy shell
In[1]: foo = """

          
            Next
            
          

"""
In [2]: from scrapy import Selector
In [3]: sel = Selector(text=foo)
In [4]: sel.css('.sb_pagF li a ::attr(href)').extract()
Out[1]: [u'/search?q=site%3asite.com&first=11&FORM=PORE']

Match html output for result scrapy (skip first match)

Answers (2)

Related Questions