Scraping MLB Gameday Data

Question

I'm using BeautifulSoup to try and scrape data from the MLB gameday pages.

Right now, I'm simply trying to extract gameday ids.

Here's an example of a page:

url = "http://gd2.mlb.com/components/game/mlb/year_2017/month_04/day_20/epg.xml"

soup = BeautifulSoup(urlopen(d_url), "lxml")

After this, I'm not sure how to navigate and find the ids.

They're stored in a 2 different places for each game:

  game_data_directory="/components/game/mlb/year_2017/month_04/day_20/
                       gid_2017_04_20_bosmlb_tormlb_1"

gameday="2017_04_20_bosmlb_tormlb_1"

What's the best way to find, and then store the ids?

Thanks.

iamklaus · Accepted Answer

data = requests.get('http://gd2.mlb.com/components/game/mlb/year_2017/month_04/day_20/epg.xml')
data = BeautifulSoup(data.content, "lxml")

for game in data.find_all('game'):
    print(game['game_data_directory'])
    pos = game['game_data_directory'].rfind('/')
    print(game['game_data_directory'][46:])

Scraping MLB Gameday Data

Answers (1)

Related Questions