user1610719
user1610719

Reputation: 1303

Scraping website that uses javascript

I'm trying to scrape this page: http://stats.nba.com/playerGameLogs.html?PlayerID=2544&pageNo=1&rowsPerPage=100

I'm wanting to get the table into a pandas DataFrame. I've tried BeautifulSoup and it's obvious that won't work. I tried to use selenium, but I'm not having luck with it. I'm hoping someone has a better solution before I continue going down the selenium path, as it at least opens up the browser and shows the correct output, Firefox just force closes after. I also prefer to not have to physically open up the browser either, as I would be doing this for 1000s of pages.

Upvotes: 1

Views: 403

Answers (1)

alecxe
alecxe

Reputation: 473863

There is no need for scraping HTML, or using a high-level selenium approach.

Simulate the underlying XHR request(s) going to the server and returning the JSON data that is used to fill up the table on the page.

Here's an example using requests:

import requests

url = 'http://stats.nba.com/stats/playergamelog'

params = {
    'Season': '2013-14',
    'SeasonType': 'Regular Season',
    'LeagueID': '00',
    'PlayerID': '2544',
    'pageNo': '1',
    'rowsPerPage': '100'
}
response = requests.post(url, data=params)

print response.json()

Prints the JSON structure containing the player game logs:

{u'parameters': {u'LeagueID': u'00',
                 u'PlayerID': 2544,
                 u'Season': u'2013-14',
                 u'SeasonType': u'Regular Season'},
 u'resource': u'playergamelog',
 u'resultSets': [{u'headers': [u'SEASON_ID',
                               u'Player_ID',
                               u'Game_ID',
                               u'GAME_DATE',
                               u'MATCHUP',
                               u'WL',
                               u'MIN',
                               u'FGM',
                               u'FGA',
                               u'FG_PCT',
                               u'FG3M',
                               u'FG3A',
                               u'FG3_PCT',
                               u'FTM',
                               u'FTA',
                               u'FT_PCT',
                               u'OREB',
                               u'DREB',
                               u'REB',
                               u'AST',
                               u'STL',
                               u'BLK',
                               u'TOV',
                               u'PF',
                               u'PTS',
                               u'PLUS_MINUS',
                               u'VIDEO_AVAILABLE'],
                  u'name': u'PlayerGameLog',
                  u'rowSet': [[u'22013',
                               2544,
                               u'0021301192',
                               u'APR 12, 2014',
                               u'MIA @ ATL',
                               u'L',
                               37,
                               10,
                               22,
                               0.455,
                               3,
                               7,
                               0.429,
                               4,
                               8,
                               0.5,
                               3,
                               5,
                               8,
                               5,
                               0,
                               1,
                               3,
                               2,
                               27,
                               -13,
                               1],
                              [u'22013',
                               2544,
                               u'0021301180',
                               u'APR 11, 2014',
                               u'MIA vs. IND',
                               u'W',
                               35,
                               11,
                               20,
                               0.55,
                               2,
                               4,
                               0.5,
                               12,
                               13,
                               0.923,
                               1,
                               5,
                               6,
                               1,
                               1,
                               1,
                               2,
                               1,
                               36,
                               13,
                               1],
                              [u'22013',
                               2544,
                               u'0021301167',
                               u'APR 09, 2014',
                               u'MIA @ MEM',
                               u'L',
                               41,
                               14,
                               23,
                               0.609,
                               3,
                               5,
                               0.6,
                               6,
                               7,
                               0.857,
                               1,
                               5,
                               6,
                               5,
                               2,
                               0,
                               5,
                               1,
                               37,
                               -8,
                               1],
    ...
}

Alternative solution would be to use an NBA API, see several options here:

Upvotes: 3

Related Questions