Mike
Mike

Reputation: 331

Waiting for the loading page with scrapy

I'm trying to take the content of a webpage using FormRequest to bypass a form. But the problem is that after this form, there is a page with a loading bar and only after this bar is full the site show me the content that I want. The scrapy script is giving the loading page in the Response object, not the final webpage with the results that I want. What I can do to solve this? I believe that maybe I need to set a timer to make the crawler wait the loading page finish his work.

Upvotes: 1

Views: 7412

Answers (1)

alexizydorczyk
alexizydorczyk

Reputation: 920

There's not concept of waiting when doing basic HTML scraping. Scrapy makes a request to a webserver and receives a response - that response is all you get.

In all likelihood, the loading bar on the page is using Javascript to render the results of the page. An ordinary browser will appear to wait on the page - under the hood, it's running Javascript and likely making more requests to a web-server before it has enough information to render the page.

In order to replicate the result programmatically, you will have to somehow render that Javascript. Unfortunately, Scrapy does not have that capability built in.

Some options you have include:

http://www.seleniumhq.org/

https://github.com/scrapinghub/splash

Upvotes: 2

Related Questions