Reputation: 3
I am having trouble scraping ESPN Gamecast links from the espn scoreboard webpage. I have tried:
site = "https://www.espn.com/mlb/scoreboard"
html = requests.get(site).text
soup = BeautifulSoup(html, 'html.parser').find_all('a')
links = [link.get('href') for link in soup]
but the links are not being recognized.
Upvotes: 0
Views: 1011
Reputation: 28630
It's loaded dynamically so you need to either a) use somethinging like Selenium that allows the page to render before parsing with bs4, or b) go straight to the data source/api. Api is often the best option:
import requests
api = 'http://site.api.espn.com/apis/site/v2/sports/baseball/mlb/scoreboard'
jsonData = requests.get(api).json()
events = jsonData['events']
links = []
for event in events:
event_links = event['links']
for each in event_links:
if each['text'] == 'Gamecast':
links.append(each['href'])
Ouput:
print(links)
['http://www.espn.com/mlb/game/_/gameId/401228229', 'http://www.espn.com/mlb/game/_/gameId/401228235', 'http://www.espn.com/mlb/game/_/gameId/401228242', 'http://www.espn.com/mlb/game/_/gameId/401228240', 'http://www.espn.com/mlb/game/_/gameId/401228233', 'http://www.espn.com/mlb/game/_/gameId/401228234', 'http://www.espn.com/mlb/game/_/gameId/401228239', 'http://www.espn.com/mlb/game/_/gameId/401228237', 'http://www.espn.com/mlb/game/_/gameId/401228231', 'http://www.espn.com/mlb/game/_/gameId/401228232', 'http://www.espn.com/mlb/game/_/gameId/401228236', 'http://www.espn.com/mlb/game/_/gameId/401228230', 'http://www.espn.com/mlb/game/_/gameId/401228238', 'http://www.espn.com/mlb/game/_/gameId/401228243', 'http://www.espn.com/mlb/game/_/gameId/401228241']
Upvotes: 0
Reputation: 11
Would it be the case that you missed out on the quotation marks? I have tried the following and could produce the output.
site = 'https://www.espn.com/mlb/scoreboard/_/date/20210624'
html = requests.get(site).text
soup = BeautifulSoup(html, 'html.parser').find_all('a')
links = [link.get('href') for link in soup]
print(links)
Upvotes: 1