Reputation: 139
I'm using BeautifulSoup to try and scrape data from the MLB gameday pages.
Right now, I'm simply trying to extract gameday ids.
Here's an example of a page:
url = "http://gd2.mlb.com/components/game/mlb/year_2017/month_04/day_20/epg.xml"
soup = BeautifulSoup(urlopen(d_url), "lxml")
After this, I'm not sure how to navigate and find the ids.
They're stored in a 2 different places for each game:
game_data_directory="/components/game/mlb/year_2017/month_04/day_20/
gid_2017_04_20_bosmlb_tormlb_1"
gameday="2017_04_20_bosmlb_tormlb_1"
What's the best way to find, and then store the ids?
Thanks.
Upvotes: 0
Views: 460
Reputation: 3770
data = requests.get('http://gd2.mlb.com/components/game/mlb/year_2017/month_04/day_20/epg.xml')
data = BeautifulSoup(data.content, "lxml")
for game in data.find_all('game'):
print(game['game_data_directory'])
pos = game['game_data_directory'].rfind('/')
print(game['game_data_directory'][46:])
Upvotes: 1