RookiePython
RookiePython

Reputation: 145

How do I pull the table data from this website?

Using the below code, I can't pull the College Football matchups from pregame.com in the game center.

I've tried multiple class ids with different elements, and even tried pulling with pandas, but can't get the entire table. Is there another way to scrape it successfully?

from bs4 import BeautifulSoup
import lxml
import requests


header = {'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36'}
pregame = requests.get('https://pregame.com/game-center/?d=1636174800000&t=0&l=2&a=0&s=AwayRot&m=false&b=undefined&o=Current&c=All&k=', 'r').text
soup = BeautifulSoup(pregame, 'lxml')

div = soup.find_all('p', class_ = 'pggc-col-data pggc-away')
print(div)

Upvotes: 0

Views: 185

Answers (2)

chitown88
chitown88

Reputation: 28565

You may need to do a little data manipulation and joins depending what you are after. But you can get the data back in json format from the api and parse it.

import requests
import pandas as pd

url = 'https://socket.pregame.com/api/gamecenter/bootstrap'

jsonData = requests.get(url).json()

data = {}
for each, v in jsonData.items():
    data[each] = pd.DataFrame(v)

for key, table in data.items():
    print(f'\n*** {key} ***')
    print(table.head(10).to_string())


leagues_dict = dict(zip(data['Leagues']['Name'],data['Leagues']['Id']))

final_data = {}
for k, v in leagues_dict.items():
    
    events_df = data['Events'][data['Events']['LeagueId'] == v].rename(columns={'Id':'EventId'})
    groups_df = data['Groups'][data['Groups']['LeagueId'] == v].rename(columns={'Id':'EventGroupId'})
    odds_df = data['Odds'][data['Odds']['LeagueId'] == v]
    scores_df = data['Scores']
    
    final_df = events_df.merge(groups_df.drop('LeagueId', axis=1), how='left', on='EventGroupId')
    final_df = final_df.merge(odds_df.drop('LeagueId', axis=1), how='right', on='EventId')
    
    if len(scores_df) > 0:
        final_df = final_df.merge(scores_df, how='left', on='EventId')
    
    final_data.update({k:final_df})

You can always just write this to csv and view in excel then too if it's easier for you to work with.

Upvotes: 1

thedatadavis
thedatadavis

Reputation: 89

the problem you're running into is that the data is being loaded dynamically via javascript.

You'll want to check out something like Selenium to work around this. Here's a good overview: How to Scrape Data From JavaScript-Based Website Using Python, Selenium, and Headless Web Driver

Upvotes: 1

Related Questions