Justin
Justin

Reputation: 139

Scraping MLB Gameday Data

I'm using BeautifulSoup to try and scrape data from the MLB gameday pages.

Right now, I'm simply trying to extract gameday ids.

Here's an example of a page:

url = "http://gd2.mlb.com/components/game/mlb/year_2017/month_04/day_20/epg.xml"

soup = BeautifulSoup(urlopen(d_url), "lxml")

After this, I'm not sure how to navigate and find the ids.

They're stored in a 2 different places for each game:

  game_data_directory="/components/game/mlb/year_2017/month_04/day_20/
                       gid_2017_04_20_bosmlb_tormlb_1"

gameday="2017_04_20_bosmlb_tormlb_1"

What's the best way to find, and then store the ids?

Thanks.

Upvotes: 0

Views: 460

Answers (1)

iamklaus
iamklaus

Reputation: 3770

data = requests.get('http://gd2.mlb.com/components/game/mlb/year_2017/month_04/day_20/epg.xml')
data = BeautifulSoup(data.content, "lxml")

for game in data.find_all('game'):
    print(game['game_data_directory'])
    pos = game['game_data_directory'].rfind('/')
    print(game['game_data_directory'][46:])

Upvotes: 1

Related Questions