Reputation: 33
This seems similar to my previous post (i'll link at the bottom), but this is a different url and it uses tables. when i run the following code, i can get all of the data within that extracted:
import requests
from bs4 import BeautifulSoup
url = "https://www.nascar.com/wp-content/plugins/raw-feed/raw-feed.php"
r = requests.get(url)
soup = BeautifulSoup(r.text, "lxml")
try:
data = soup.find('div', class_='div-col1')
print(data)
except:
print("You Get Nothing!")
I then change up the try to
try:
data = soup.find_all('td', class_='car')
print(data)
except:
print("You Get Nothing!")
and I am only getting the info pulled from the thead
and not the tbody
Is there something i'm missing, or doing wrong? The further in i try to nail down, i either error out, or just get a return of empty [ ]
Also, this webpage is Dynamic, and i tried what was given to me in my previous thread Old Post, and i understand the layout and coding between the 2 pages is different, but my concern with that is that loading Chrome every time I run the script will be a lot since it will probably need tp be refreshed every 30sec-1min 300-400 times.
Upvotes: 0
Views: 1001
Reputation: 22440
The data you wish to fetch from that page gets generated dynamically so when you make a http request using requests
library, it can't handle that. However, you can try with new library from the same author requests-html. It is capable of handling dynamically generated content. This is how you can go with this new library:
import requests_html
URL = "https://www.nascar.com/wp-content/plugins/raw-feed/raw-feed.php"
with requests_html.HTMLSession() as session:
r = session.get(URL)
r.html.render(sleep=5)
for items in r.html.find('#pqrStatistic tr'):
data = [item.text for item in items.find("th,td")]
print(data)
Partial results:
['pos', 'car', 'driver', 'manuf', 'delta', 'laps', 'last lap', 'best time', 'best speed', 'best lap']
['1', '54', 'Kyle Benjamin(i)', '', '--', '161', '36.474', '20.198', '93.752', '8']
['2', '98', 'Grant Enfinger', '', '0.761', '161', '36.402', '20.144', '94.003', '157']
['3', '4', 'Todd Gilliland #', '', '1.407', '161', '36.359', '20.142', '94.013', '158']
['4', '8', 'John H. Nemechek(i)', '', '2.177', '161', '36.304', '20.234', '93.585', '31']
['5', '16', 'Brett Moffitt', '', '3.268', '161', '36.145', '20.359', '93.010', '8']
Upvotes: 0
Reputation: 1433
why don't you just go directly with the source, if you see the page source of the link it is getting data from https://www.nascar.com/live/feeds/live-feed.json, with that you can easily get the data in json format and parse it as you like.
import requests
import json
url = "https://www.nascar.com/live/feeds/live-feed.json"
res = requests.get(url)
print(r.json())
Upvotes: 2