Reputation: 1303
I'm trying to scrape this page: http://stats.nba.com/playerGameLogs.html?PlayerID=2544&pageNo=1&rowsPerPage=100
I'm wanting to get the table into a pandas DataFrame. I've tried BeautifulSoup and it's obvious that won't work. I tried to use selenium, but I'm not having luck with it. I'm hoping someone has a better solution before I continue going down the selenium path, as it at least opens up the browser and shows the correct output, Firefox just force closes after. I also prefer to not have to physically open up the browser either, as I would be doing this for 1000s of pages.
Upvotes: 1
Views: 403
Reputation: 473863
There is no need for scraping HTML, or using a high-level selenium
approach.
Simulate the underlying XHR request(s) going to the server and returning the JSON data that is used to fill up the table on the page.
Here's an example using requests
:
import requests
url = 'http://stats.nba.com/stats/playergamelog'
params = {
'Season': '2013-14',
'SeasonType': 'Regular Season',
'LeagueID': '00',
'PlayerID': '2544',
'pageNo': '1',
'rowsPerPage': '100'
}
response = requests.post(url, data=params)
print response.json()
Prints the JSON structure containing the player game logs:
{u'parameters': {u'LeagueID': u'00',
u'PlayerID': 2544,
u'Season': u'2013-14',
u'SeasonType': u'Regular Season'},
u'resource': u'playergamelog',
u'resultSets': [{u'headers': [u'SEASON_ID',
u'Player_ID',
u'Game_ID',
u'GAME_DATE',
u'MATCHUP',
u'WL',
u'MIN',
u'FGM',
u'FGA',
u'FG_PCT',
u'FG3M',
u'FG3A',
u'FG3_PCT',
u'FTM',
u'FTA',
u'FT_PCT',
u'OREB',
u'DREB',
u'REB',
u'AST',
u'STL',
u'BLK',
u'TOV',
u'PF',
u'PTS',
u'PLUS_MINUS',
u'VIDEO_AVAILABLE'],
u'name': u'PlayerGameLog',
u'rowSet': [[u'22013',
2544,
u'0021301192',
u'APR 12, 2014',
u'MIA @ ATL',
u'L',
37,
10,
22,
0.455,
3,
7,
0.429,
4,
8,
0.5,
3,
5,
8,
5,
0,
1,
3,
2,
27,
-13,
1],
[u'22013',
2544,
u'0021301180',
u'APR 11, 2014',
u'MIA vs. IND',
u'W',
35,
11,
20,
0.55,
2,
4,
0.5,
12,
13,
0.923,
1,
5,
6,
1,
1,
1,
2,
1,
36,
13,
1],
[u'22013',
2544,
u'0021301167',
u'APR 09, 2014',
u'MIA @ MEM',
u'L',
41,
14,
23,
0.609,
3,
5,
0.6,
6,
7,
0.857,
1,
5,
6,
5,
2,
0,
5,
1,
37,
-8,
1],
...
}
Alternative solution would be to use an NBA API, see several options here:
Upvotes: 3