Reputation: 57
I want to scrape the information from this page:
https://databases.usatoday.com/nfl-arrests/
Each of the arrests is listed in a table on the page under the css selector: #csp-data
I can see this in the page's source as well: <div id="csp-data" class="csp-data"></div>
but there is nothing in-between those tags for me to parse.
When I try to run the following code, I return no results.
import requests
from bs4 import BeautifulSoup
url = "https://databases.usatoday.com/nfl-arrests/"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data, "html.parser")
test = soup.select('#csp-data > div > div:nth-child(3) > div > div.table-responsive > table > tbody')
print(test)
If I use test = soup.select('#csp-data')
, I return <div class="csp-data" id="csp-data"></div>
If I move to the next step #csp-data > div
, I return no results.
I'm assuming that the data isn't being loaded when requests gets the data, but I'm not sure. When I go in through my browser and use inspect element, I can see the table has loaded.
Does anyone have an idea on how I could move forward here?
Upvotes: 1
Views: 80
Reputation: 16187
Here is the working output from ajax calls
import requests
import json
body = 'action=cspFetchTable&security=3193d24eb0&pageID=10&blogID=&sortBy=Date&sortOrder=desc&page=1&searches={}&heads=true'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36',
'Content-Type': 'application/x-www-form-urlencoded'}
url='https://databases.usatoday.com/wp-admin/admin-ajax.php'
r = requests.post(url, data=body, headers =headers)
tables = r.json()['data']['Result']
for table in tables:
print(table['First_name'])
Output:Example
Bradley
Deonte
Barkevious
Darius
Jarron
Tamorrion
Zaven
Frank
Justin
Aldon
Jeff
Marshon
Broderick
Frank
Jaydon
Kevin
Kemah
Chad
Isaiah
Rashard
Upvotes: 1