Reputation: 145
Using the below code, I can't pull the College Football matchups from pregame.com in the game center.
I've tried multiple class ids with different elements, and even tried pulling with pandas, but can't get the entire table. Is there another way to scrape it successfully?
from bs4 import BeautifulSoup
import lxml
import requests
header = {'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36'}
pregame = requests.get('https://pregame.com/game-center/?d=1636174800000&t=0&l=2&a=0&s=AwayRot&m=false&b=undefined&o=Current&c=All&k=', 'r').text
soup = BeautifulSoup(pregame, 'lxml')
div = soup.find_all('p', class_ = 'pggc-col-data pggc-away')
print(div)
Upvotes: 0
Views: 185
Reputation: 28565
You may need to do a little data manipulation and joins depending what you are after. But you can get the data back in json format from the api and parse it.
import requests
import pandas as pd
url = 'https://socket.pregame.com/api/gamecenter/bootstrap'
jsonData = requests.get(url).json()
data = {}
for each, v in jsonData.items():
data[each] = pd.DataFrame(v)
for key, table in data.items():
print(f'\n*** {key} ***')
print(table.head(10).to_string())
leagues_dict = dict(zip(data['Leagues']['Name'],data['Leagues']['Id']))
final_data = {}
for k, v in leagues_dict.items():
events_df = data['Events'][data['Events']['LeagueId'] == v].rename(columns={'Id':'EventId'})
groups_df = data['Groups'][data['Groups']['LeagueId'] == v].rename(columns={'Id':'EventGroupId'})
odds_df = data['Odds'][data['Odds']['LeagueId'] == v]
scores_df = data['Scores']
final_df = events_df.merge(groups_df.drop('LeagueId', axis=1), how='left', on='EventGroupId')
final_df = final_df.merge(odds_df.drop('LeagueId', axis=1), how='right', on='EventId')
if len(scores_df) > 0:
final_df = final_df.merge(scores_df, how='left', on='EventId')
final_data.update({k:final_df})
You can always just write this to csv and view in excel then too if it's easier for you to work with.
Upvotes: 1
Reputation: 89
the problem you're running into is that the data is being loaded dynamically via javascript.
You'll want to check out something like Selenium to work around this. Here's a good overview: How to Scrape Data From JavaScript-Based Website Using Python, Selenium, and Headless Web Driver
Upvotes: 1