bmatt23
bmatt23

Reputation: 13

Is there a different way to scrape this with pandas?

So I'm trying to scrape the tables off this website for NBA teams in years past and I'm trying to get a specific table off of it. I really just know how to scrape tables with pandas read_html function, so I've been doing that. When I used the length function, pandas told me there were only 5 tables, when there really are 14. This is the image I want to get the data off of This is the image that I want to get the data off, but Pandas doesn't think that this exists. The code that I used was as follows:

import pandas as pd 

url = "https://www.basketball-reference.com/teams/BOS/1980.html"

tables= pd.read_html(url)

So when I run it, I look through all the tables and I only get 5 tables. Can anyone help?

Upvotes: 0

Views: 42

Answers (1)

Rob Raymond
Rob Raymond

Reputation: 31226

  • switch off javascript on your browser, reload the page
  • table is not displayed. View source and you will see table is commented out
  • can navigate to commented sections of HTML using BeautifulSoup
  • push this HTML into pd.read_html()
import requests
from bs4 import BeautifulSoup
from bs4 import Comment
import pandas as pd
res = requests.get("https://www.basketball-reference.com/teams/BOS/1980.html")
id="div_team_and_opponent"
html = BeautifulSoup(res.content, 'html.parser')
pd.read_html(html.find_all(string=lambda text: isinstance(text, Comment) and id in text)[0])[0]

Unnamed: 0 G MP FG FGA FG% 3P 3PA 3P% 2P 2PA 2P% FT FTA FT% ORB DRB TRB AST STL BLK TOV PF PTS
0 Team 82 19880 3617 7387 0.49 162 422 0.384 3455 6965 0.496 1907 2449 0.779 1227 2457 3684 2198 809 308 1539 1974 9303
1 Team/G nan 242.4 44.1 90.1 0.49 2 5.1 0.384 42.1 84.9 0.496 23.3 29.9 0.779 15.0 30.0 44.9 26.8 9.9 3.8 18.8 24.1 113.5
2 Lg Rank nan 4 8 14 7 2 2 1 15 17 7 4 6 5 13 10 11 8 6 21 11 13 5
3 Year/Year nan 1.0% 2.6% 0.5% 0.009 nan nan nan -2.0% -5.2% 0.016 4.8% 5.5% -0.005 9.7% 2.5% 4.8% 10.2% 13.9% 8.8% -10.2% -0.2% 4.8%
4 Opponent 82 19880 3439 7313 0.47 74 259 0.286 3365 7054 0.477 1712 2222 0.77 1168 2294 3462 1867 686 419 1635 2059 8664
5 Opponent/G nan 242.4 41.9 89.2 0.47 0.9 3.2 0.286 41.0 86.0 0.477 20.9 27.1 0.77 14.2 28.0 42.2 22.8 8.4 5.1 19.9 25.1 105.7
6 Lg Rank nan 4 6 7 8 17 17 15 5 7 8 11 10 17 6 4 2 3 2 11 9 6 6
7 Year/Year nan 1.0% -10.8% -3.7% -0.037 nan nan nan -12.7% -7.1% -0.031 8.5% 6.9% 0.011 4.1% -6.5% -3.2% -14.0% -4.3% -4.3% 2.0% 1.7% -6.7%

Upvotes: 1

Related Questions