Wait before returning contents of web-page

Question

I'm trying to scrape this website: http://www.fivb.org/EN/BeachVolleyball/PlayersRanking_W.asp, but this page loads the contents of the table (probably through AJAX), after the page has been loaded.

My attempt:

import requests
from bs4 import BeautifulSoup, Comment
uri = 'http://www.fivb.org/EN/BeachVolleyball/PlayersRanking_W.asp'

r = requests.get(uri)
soup = BeautifulSoup(r.content) 
print(soup)

But the div with the id='BTechPlayM' remains empty, regardless of what I do. I've tried:

Setting a timeout on the request: requests.get(uri, timeout=10)
Passing headers
Using eventlet, to set a delay
And the latest thing was to try and use the selenium-library, to use PhantomJS (installed from NPM), but this rabbit-whole just kept going deeper and deeper.

Are there a way to send a request to a URI, wait X seconds, and return the contents then?

... Or to send a request to a URI, keep checking if a div contains an element; and only return the contents, whenever it does?

Keyur Potdar · Accepted Answer

Short answer: No. You cannot do that using requests.

But, as you said, the table data is generated dynamically using JavaScript. The data is obtained from this URL. But, the response is not in JSON format; it's JavaScript. So, from that data, you can get the required data which is available in lists using RegEx.

But, again, the data returned by RegEx is in string format and not an actual list. You can convert this string to a list using ast.literal_eval(). For example, the data looks like this:

'["1", "Humana-Paredes", "CAN", "4", "1,720", ""]'

Complete code:

import re
import requests
import ast

r = requests.get('http://www.fivb.org/Vis/Public/JS/Beach/TechPlayRank.aspx?Gender=1&id=BTechPlayW&Date=20180326')
data = re.findall(r'($$[^[$$]*])', r.text)
for player in data:
    details = ast.literal_eval(player)
    print(details)  # this var is a list (format shown below)

Partial output:

['1', 'Humana-Paredes', 'CAN', '4', '1,720', '']
['', 'Pavan', 'CAN', '4', '1,720', '']
['3', 'Talita', 'BRA', '4', '1,660', '']
['', 'Larissa', 'BRA', '4', '1,660', '']
['5', 'Hermannova', 'CZE', '4', '1,360', '']
['', 'Slukova', 'CZE', '4', '1,360', '']
['7', 'Laboureur', 'GER', '4', '1,340', '']
...

The basic format of this list (details) is:

[, , , , , ]

You can utilize this data however you want. For example, using details[1] will give you all the names.

Wait before returning contents of web-page

Answers (2)

Related Questions