Scrapy - Shell crawls the page without any problem, but selectors fail

Question

I have this start url for crawling.

https://autocarro.com.br/auto-busca/carros?AutoBusca=1&qc=&qt=&q=&ai=&af=&pi=&pf=&com=&cam=&cor=&por=&est=&cid=#1

When I send a request from scrapy shell, it is crawled without any problems. I can see the full page is rendered when I use view(response). This is the HTML code and the rendered website.

However, when I try to use selectors to get a tags, they don't work. It's like the whole HTML table body is not there.

response.css('tbody').getall() gets an empty table body or the a tags I'm looking for are not there.

I also checked the whether there is an AJAX request which I'm missing, but there is not. What's the problem here?

gangabass · Accepted Answer

You need to check source HTML code (usually Ctrl+U in a browser) for the source data. For your URL you'll find that target table is loaded from JavaScript code starting with var COLLECTION = [. You can parse that part with below code:

import json

def parse(self, response):
    json_collection = response.xpath('//script[contains(., "var COLLECTION = [")]').re_first(r'var COLLECTION = ($$.+?$$);')
    data = json.loads(json_collection) # now you have everything you need here
    for element in data:
        mark = element["mar"]
        version = element["ver"]
        ........

Scrapy - Shell crawls the page without any problem, but selectors fail

Answers (1)

Related Questions