Reputation: 2578
I'm trying to scrape this website: http://www.fivb.org/EN/BeachVolleyball/PlayersRanking_W.asp, but this page loads the contents of the table (probably through AJAX), after the page has been loaded.
My attempt:
import requests
from bs4 import BeautifulSoup, Comment
uri = 'http://www.fivb.org/EN/BeachVolleyball/PlayersRanking_W.asp'
r = requests.get(uri)
soup = BeautifulSoup(r.content)
print(soup)
But the div with the id='BTechPlayM'
remains empty, regardless of what I do. I've tried:
requests.get(uri, timeout=10)
Are there a way to send a request to a URI, wait X seconds, and return the contents then?
... Or to send a request to a URI, keep checking if a div
contains an element; and only return the contents, whenever it does?
Upvotes: 1
Views: 827
Reputation: 7238
Short answer: No. You cannot do that using requests
.
But, as you said, the table data is generated dynamically using JavaScript. The data is obtained from this URL. But, the response is not in JSON format; it's JavaScript. So, from that data, you can get the required data which is available in lists using RegEx.
But, again, the data returned by RegEx is in string format and not an actual list. You can convert this string to a list using ast.literal_eval()
. For example, the data looks like this:
'["1", "Humana-Paredes", "CAN", "4", "1,720", ""]'
Complete code:
import re
import requests
import ast
r = requests.get('http://www.fivb.org/Vis/Public/JS/Beach/TechPlayRank.aspx?Gender=1&id=BTechPlayW&Date=20180326')
data = re.findall(r'(\[[^[\]]*])', r.text)
for player in data:
details = ast.literal_eval(player)
print(details) # this var is a list (format shown below)
Partial output:
['1', 'Humana-Paredes', 'CAN', '4', '1,720', '']
['', 'Pavan', 'CAN', '4', '1,720', '']
['3', 'Talita', 'BRA', '4', '1,660', '']
['', 'Larissa', 'BRA', '4', '1,660', '']
['5', 'Hermannova', 'CZE', '4', '1,360', '']
['', 'Slukova', 'CZE', '4', '1,360', '']
['7', 'Laboureur', 'GER', '4', '1,340', '']
...
The basic format of this list (details
) is:
[<Rank>, <Name>, <Country>, <Nb. part.>, <Points>, <Entry pts.>]
You can utilize this data however you want. For example, using details[1]
will give you all the names.
Upvotes: 1
Reputation: 2103
You can use selenium, as requests doesn't give option to wait-
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup, Comment
uri = 'http://www.fivb.org/EN/BeachVolleyball/PlayersRanking_W.asp'
browser = webdriver.Chrome("./chromedriver") #download chromebrowser
browser.set_page_load_timeout(60)
browser.get(uri) #open page in browser
text = browser.page_source
browser.quit()
soup = BeautifulSoup(text)
print(soup)
You will have to download chromedriver
Upvotes: 0