Gustavo Cunha
Gustavo Cunha

Reputation: 11

Wait for result to load in Python Data Scraping

I am coding a data scaper but I don't know what to do for Python wait to the request that I made to load.

I am pulling a table from this link: http://www.ans.gov.br/perfil-do-setor/dados-e-indicadores-do-setor/sala-de-situacao

Go to Caderno 2.0 Select the first value at the droplist Run anything

The big question here is: when I run the query at the website, the website takes a while to get the output, so I need to figure out how to get Python into standby till the result comes out (see picture below)

Image execution message

Can someone help me with that? Please.

Thanks so much!

Upvotes: 1

Views: 250

Answers (1)

nmog
nmog

Reputation: 236

The reason the website is taking so long to load is because it's using some heavy Javascript to render the page.

You can use Splash, which is used to render Javascript-based pages. You can run Splash in Docker quite easily, and just make HTTP requests to the Splash container which will return HTML that looks just like the webpage as rendered in a web browser.

Although this sounds overly complicated, it is actually quite simple to set up since you don't need to modify the Docker image at all, and you need no previous knowledge of Docker to get it to work. It requires just a single line to start a local Splash server:
docker run -p 8050:8050 -p 5023:5023 scrapinghub/splash

The default timeout is 30 seconds, but in case you need to wait longer for the page to render, you can specify a different timeout as an argument. For example, for a timeout of 300 seconds:
docker run -it -p 8050:8050 scrapinghub/splash --max-timeout 300

You then just modify any existing requests you have in your Python code to route to splash instead:

i.e. http://example.com/ becomes
http://localhost:8050/render.html?url=http://example.com/


Alternatively, you can use Selenium as another user commented above, but I personally have had an easier time using Splash.

Upvotes: 1

Related Questions