Using Scrapy and Splash to Follow javascript pagination

Question

I am using Scrapy and splash to extract the data. I am looking to find a way to follow pagination that was powered with javascript. The URL is not changing it is always the same no matter on what page you are.

Next

I have tried with lua script and splash to click on the element but this does not work:

    """function main(splash)
local url = splash.args.url
assert(splash:go(url))
assert(splash:wait(1))
assert(splash:runjs('document.getElementsByClassName("btn-next")[0].click()'))
assert(splash:wait(0.75))
-- return result as a JSON object
return {html = splash:html()}                
end """



def parse(self, response):
    section = response.css('li.li-result')
    for item in section:
        yield{
            'manufacturer' :  item.css('span.brand::text').extract_first(),
            'model' : item.css('span.sub-title::text').extract_first(),
            'engine_size' :  item.css('span.nowrap::text').extract_first(),
            'model_type' : item.css('span span.nowrap::text').extract_first(),
            'old_price' : item.css('li.li-result p.old-prix span::text').extract_first(),    
            'price' : item.css('li.li-result p.prix::text').extract_first(),
            'consumption' : item.css('li.li-result div.desc::text').extract_first(),
            'date' : item.css('p.btn-publication::text').extract_first(),
            'fuel_type' : item.css('div.bc-info div.upper::text').extract_first(),
            'mileage' : item.css('li.li-result div.bc-info ul div::text')[1].extract(),
            'year' : item.css('li.li-result div.bc-info ul div::text')[2].extract(),
            'transmission_type' : item.css('li.li-result div.bc-info ul div::text')[3].extract(),
            'add_number' : item.css('li.li-result div.bc-info ul div::text')[4].extract(),

        }

    next_page = response.css('li.btn-next').extract_first() #pagination    
    if next_page != 0:
        print(response)
        yield(SplashRequest(next_page, self.parse,
            endpoint='execute',
            cache_args=['lua_source'],
            args={'lua_source': script},  

        ))

Is it even possible to do it in this way? Appreciate help.

Using Scrapy and Splash to Follow javascript pagination

Answers (1)

Solution

Some alternatives

Related Questions