sbiondio
sbiondio

Reputation: 33

Python table scrape returning no data

This seems similar to my previous post (i'll link at the bottom), but this is a different url and it uses tables. when i run the following code, i can get all of the data within that extracted:

import requests

from bs4 import BeautifulSoup

url = "https://www.nascar.com/wp-content/plugins/raw-feed/raw-feed.php"
r = requests.get(url)


soup = BeautifulSoup(r.text, "lxml")

try:
     data = soup.find('div', class_='div-col1')
     print(data)

except:
     print("You Get Nothing!")

I then change up the try to

try:
     data = soup.find_all('td', class_='car')
     print(data)

except:
     print("You Get Nothing!")

and I am only getting the info pulled from the thead and not the tbody

Is there something i'm missing, or doing wrong? The further in i try to nail down, i either error out, or just get a return of empty [ ]

Also, this webpage is Dynamic, and i tried what was given to me in my previous thread Old Post, and i understand the layout and coding between the 2 pages is different, but my concern with that is that loading Chrome every time I run the script will be a lot since it will probably need tp be refreshed every 30sec-1min 300-400 times.

Upvotes: 0

Views: 1001

Answers (2)

SIM
SIM

Reputation: 22440

The data you wish to fetch from that page gets generated dynamically so when you make a http request using requests library, it can't handle that. However, you can try with new library from the same author requests-html. It is capable of handling dynamically generated content. This is how you can go with this new library:

import requests_html

URL = "https://www.nascar.com/wp-content/plugins/raw-feed/raw-feed.php"

with requests_html.HTMLSession() as session:
    r = session.get(URL)
    r.html.render(sleep=5)
    for items in r.html.find('#pqrStatistic tr'):
        data = [item.text for item in items.find("th,td")]
        print(data)

Partial results:

['pos', 'car', 'driver', 'manuf', 'delta', 'laps', 'last lap', 'best time', 'best speed', 'best lap']
['1', '54', 'Kyle Benjamin(i)', '', '--', '161', '36.474', '20.198', '93.752', '8']
['2', '98', 'Grant Enfinger', '', '0.761', '161', '36.402', '20.144', '94.003', '157']
['3', '4', 'Todd Gilliland #', '', '1.407', '161', '36.359', '20.142', '94.013', '158']
['4', '8', 'John H. Nemechek(i)', '', '2.177', '161', '36.304', '20.234', '93.585', '31']
['5', '16', 'Brett Moffitt', '', '3.268', '161', '36.145', '20.359', '93.010', '8']

Upvotes: 0

johnII
johnII

Reputation: 1433

why don't you just go directly with the source, if you see the page source of the link it is getting data from https://www.nascar.com/live/feeds/live-feed.json, with that you can easily get the data in json format and parse it as you like.

import requests
import json

url = "https://www.nascar.com/live/feeds/live-feed.json"
res = requests.get(url)
print(r.json())

Upvotes: 2

Related Questions