hanz
hanz

Reputation: 49

BeautifulSoup findAll not returning values on webpage

I want to webscrape individual game pages on yahoo sports.

This is an example of the type of webpage i would like to scrape: https://sports.yahoo.com/nfl/atlanta-falcons-philadelphia-eagles-20180906021/?section=teamcomparison

Underneath the initial Box Score, you will see a tab titled "Team Comparison". What I am trying to obtain are the statistics that are underneath "Offensive/Defensive Team Ranks" for each team.

# The URL i would like to scrape.
url = 'https://sports.yahoo.com/nfl/atlanta-falcons-philadelphia-eagles- 
20180906021/?section=teamcomparison'

# Reading in the HTML code with BeautifulSoup
uClient = uReq(url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
#page_soup

# Finding the segment of HTML code with my desired stats
stats = page_soup.findAll("div", {"class": "D(ib) Bxz(bb) W(100%)"})
print(stats)
### Result line -> In [743]: []

This is should be giving me the list of Offensive and Defensive ranks per team (e.g., Atlanta Passing Yards Per Game = 309.3 and Passing Yards Per Game Rank = 4), however it is only giving me "[]" and not returning any values. I believe this is because of the Javascript embedded in the webpage, however i am new to webscraping and not sure how to go about this.

Upvotes: 0

Views: 53

Answers (1)

marke
marke

Reputation: 1074

This data is actually downloaded from the API with AJAX, so you don't need to scrape it, you can ask API yourself if you know how to compose the URL. For example for the page that you gave in your post the URL is: https://sports.yahoo.com/site/api/resource/sports.game.team_stat_leaders;id=nfl.g.20180906021

So you only need to know the id part of the url for every game. The JSON you will get in response is a little bit obscure but after a while it is possible to understand what is going on :).

Example code to get the data:

import requests
response = requests.get("https://sports.yahoo.com/site/api/resource/sports.game.team_stat_leaders;id=nfl.g.20180906021")
data = response.json()

Upvotes: 1

Related Questions