Stephen
Stephen

Reputation: 3

ESPN Gamecast Python Webscraping

I am having trouble scraping ESPN Gamecast links from the espn scoreboard webpage. I have tried:

site = "https://www.espn.com/mlb/scoreboard"

html = requests.get(site).text

soup = BeautifulSoup(html, 'html.parser').find_all('a')

links = [link.get('href') for link in soup]

but the links are not being recognized.

Upvotes: 0

Views: 1011

Answers (2)

chitown88
chitown88

Reputation: 28630

It's loaded dynamically so you need to either a) use somethinging like Selenium that allows the page to render before parsing with bs4, or b) go straight to the data source/api. Api is often the best option:

import requests

api = 'http://site.api.espn.com/apis/site/v2/sports/baseball/mlb/scoreboard'

jsonData = requests.get(api).json()
events = jsonData['events']

links = []
for event in events:
    event_links = event['links']
    for each in event_links:
        if each['text'] == 'Gamecast':
            links.append(each['href'])

Ouput:

print(links)
['http://www.espn.com/mlb/game/_/gameId/401228229', 'http://www.espn.com/mlb/game/_/gameId/401228235', 'http://www.espn.com/mlb/game/_/gameId/401228242', 'http://www.espn.com/mlb/game/_/gameId/401228240', 'http://www.espn.com/mlb/game/_/gameId/401228233', 'http://www.espn.com/mlb/game/_/gameId/401228234', 'http://www.espn.com/mlb/game/_/gameId/401228239', 'http://www.espn.com/mlb/game/_/gameId/401228237', 'http://www.espn.com/mlb/game/_/gameId/401228231', 'http://www.espn.com/mlb/game/_/gameId/401228232', 'http://www.espn.com/mlb/game/_/gameId/401228236', 'http://www.espn.com/mlb/game/_/gameId/401228230', 'http://www.espn.com/mlb/game/_/gameId/401228238', 'http://www.espn.com/mlb/game/_/gameId/401228243', 'http://www.espn.com/mlb/game/_/gameId/401228241']

Upvotes: 0

Jasper_0077
Jasper_0077

Reputation: 11

Would it be the case that you missed out on the quotation marks? I have tried the following and could produce the output.

site = 'https://www.espn.com/mlb/scoreboard/_/date/20210624'
html = requests.get(site).text
soup = BeautifulSoup(html, 'html.parser').find_all('a')
links = [link.get('href') for link in soup]
print(links)

Upvotes: 1

Related Questions