Reputation: 49074
I am using pandas to grab some ice hockey stats from a web page as shown below:
import pandas as pd
url_goal = 'http://www.quanthockey.com/nhl/records/nhl-players-all-time-goals-per-game-leaders.html'
df_goal = pd.read_html(url_goal, index_col=0, header=0)[0]
This works great, but the problem is that switching to the second page of the stats table on the homepage, does not change the url, so I cannot use the same approach to grab more than the top 50 players. There is a javascript address to the table that does change as the page number switches. I read a little about selenium and beautifulsoup, but I don't have these installed so I would prefer to do it without them is possible. So my question is two-fold:
Is there any way to grab data from the different pages in this javascript table using only pandas and standard Python/SciPy libraries (Anaconda to be exact)?
If not, how would you go about getting this data into a pandas data frame with the help of selenium or your package of choice?
Upvotes: 1
Views: 1357
Reputation: 3709
Hint: Open the network analyzer in your browser and watch what happens when you navigate to different pages; you'll notice a GET
request to a page like
http://www.quanthockey.com/scripts/AjaxPaginate.php?cat=Records&pos=Players&SS=&af=0&nat=alltime&st=reg&sort=goals-per-game&page=3&league=NHL&lang=en&rnd=451318572
Notice the page
part of the query string.
You can just iterate through the range of numbers corresponding to how many pages there are, changing the query string page
parameter, increasing it by one each time (for example)
Upvotes: 3