Reputation: 25
I'm trying to scrape some information about MLB players from the MLB website. However, using urllib2 and BeautifulSoup, I can't find the contents under the 'div'. But I can clearly see the contents on Chrome.
An example is that, going to page (http://mlb.mlb.com/team/player.jsp?player_id=150378). The Status info on the upper right side shows 'Released'. But I can't find this string/content using BS4.
Here's my code:
base_url = 'http://mlb.mlb.com/team/player.jsp?player_id=150378'
request = urllib2.Request(base_url)
response = urllib2.urlopen(request)
soup = BeautifulSoup(response)
player_status = soup.findAll('div',id='player_status')
print player_status
I was expecting it to have a string like 'Status: Released', but the result only shows
[<div id="player_status"></div>]
I have never encountered this problem before. Can someone help me with this? Thanks!!
Upvotes: 2
Views: 1084
Reputation: 473863
Player information on the page is coming from the response of an additional XHR request to the JSON API. You can simulate it, for example, using requests
:
>>> import requests
>>>
>>> url = "http://mlb.mlb.com/lookup/json/named.player_info.bam?sport_code=%27mlb%27&player_id=150378"
>>>
>>> response = requests.get(url)
>>> data = response.json()
>>> data['player_info']['queryResults']['row']['status']
Released
Upvotes: 1