Reputation: 65
I am running into an issue when using BeautifulSoup to scrape data off of www.basketball-reference.com. I've used BeautifulSoup before on Bballreference before so I am a little stumped as to what is happening (granted I am a pretty huge noob so please bear with me).
I am trying to scrape team season stats off of https://www.basketball-reference.com/leagues/NBA_2020.html and am running into troubles from the very start:
from bs4 import BeautifulSoup
import requests
web_response = requests.get('https://www.basketball-reference.com/leagues/NBA_2020.html').text
soup = BeautifulSoup(web_response, 'lxml')
table = soup.find('table', id='team-stats-per_game')
print(table)
This shows that the finding of the table in question was unsuccessful even though I can clearly locate that tag when inspecting the web page. Okay... no biggie so far (usually these errors are on my end) so I instead just print out the whole soup:
soup = BeautifulSoup(web_response, 'lxml')
print(soup)
I copy and paste that into https://codebeautify.org/htmlviewer/. To get a better view than from the terminal and I see that it does not look how I would expect it to. Essentially the meta tags are fine but everything else appears to have lost its opening and closing tags, just making my soup into an actual soup...
Again, no biggie (still pretty sure it is something that I am doing), so I go and grab the html from a simple blog site, print it, and paste it into codebeautify and lo and behold it looks normal. Now I have a suspicion that something is occurring on basketball-reference's side that is obscuring my ability to even grab the html.
My question is this; what exactly is going on here? I am assuming it's an 80% chance it is still me but the 20% is not so sure at this point. Can someone point out what I am doing wrong or how to grab the html?
Upvotes: 1
Views: 423
Reputation: 195408
The data is stored within the page, but inside the HTML comment.
To parse it, you can do for example:
import requests
from bs4 import BeautifulSoup, Comment
web_response = requests.get('https://www.basketball-reference.com/leagues/NBA_2020.html').text
soup = BeautifulSoup(web_response, 'lxml')
table = soup.find('table', id='team-stats-per_game')
# find the comment section where the data is stored
for idx, c in enumerate(soup.select_one('div#all_team-stats-per_game').contents):
if isinstance(c, Comment):
break
# load the data from comment:
soup2 = BeautifulSoup(soup.select_one('div#all_team-stats-per_game').contents[idx], 'html.parser')
# print data:
for tr in soup2.select('tr:has(td)'):
tds = tr.select('td')
for td in tds:
print(td.get_text(strip=True), end='\t')
print()
Prints:
Dallas Mavericks 67 241.5 41.6 90.0 .462 15.3 41.5 .369 26.3 48.5 .542 17.9 23.1 .773 10.6 36.4 47.0 24.5 6.3 5.0 12.8 19.0 116.4
Milwaukee Bucks* 65 240.8 43.5 91.2 .477 13.7 38.6 .356 29.8 52.6 .567 17.8 24.0 .742 9.5 42.2 51.7 25.9 7.4 6.0 14.9 19.2 118.6
Houston Rockets 64 241.2 41.1 90.7 .454 15.4 44.3 .348 25.7 46.4 .554 20.5 26.0 .787 10.4 34.6 44.9 21.5 8.5 5.1 14.7 21.6 118.1
Portland Trail Blazers 66 240.8 41.9 90.9 .461 12.6 33.8 .372 29.3 57.1 .513 17.3 21.7 .798 10.1 35.4 45.5 20.2 6.1 6.2 13.0 21.4 113.6
Atlanta Hawks 67 243.0 40.6 90.6 .449 12.0 36.1 .333 28.6 54.5 .525 18.5 23.4 .790 9.9 33.4 43.3 24.0 7.8 5.1 16.2 23.1 111.8
New Orleans Pelicans 64 242.3 42.6 92.2 .462 14.0 37.6 .372 28.6 54.6 .525 16.9 23.2 .729 11.2 35.8 47.0 27.0 7.6 5.1 16.2 21.0 116.2
Los Angeles Clippers 64 241.2 41.6 89.7 .464 12.2 33.2 .366 29.5 56.5 .522 20.8 26.2 .792 11.0 37.0 48.0 23.8 7.1 5.0 14.8 22.0 116.2
Washington Wizards 64 241.2 41.9 91.0 .461 12.3 33.1 .372 29.6 57.9 .511 19.5 24.8 .787 10.1 31.6 41.7 25.3 8.1 4.3 14.1 22.6 115.6
Memphis Grizzlies 65 240.4 42.8 91.0 .470 10.9 31.1 .352 31.8 59.9 .531 16.2 21.3 .761 10.4 36.3 46.7 27.0 8.0 5.6 15.3 20.8 112.6
Phoenix Suns 65 241.2 40.8 87.8 .464 11.2 31.7 .353 29.6 56.1 .527 19.8 24.0 .826 9.8 33.3 43.1 27.2 7.8 4.0 15.1 22.1 112.6
Miami Heat 65 243.5 39.6 84.4 .470 13.4 34.8 .383 26.3 49.6 .530 19.5 25.1 .778 8.5 36.0 44.5 26.0 7.4 4.5 14.9 20.4 112.2
Minnesota Timberwolves 64 243.1 40.4 91.6 .441 13.3 39.7 .336 27.1 52.0 .521 19.1 25.4 .753 10.5 34.3 44.8 23.8 8.7 5.7 15.3 21.4 113.3
Boston Celtics* 64 242.0 41.2 89.6 .459 12.4 34.2 .363 28.8 55.4 .519 18.3 22.8 .801 10.7 35.3 46.0 22.8 8.3 5.6 13.6 21.4 113.0
Toronto Raptors* 64 241.6 40.6 88.5 .458 13.8 37.0 .371 26.8 51.5 .521 18.1 22.6 .800 9.7 35.5 45.2 25.4 8.8 4.9 14.4 21.5 113.0
Los Angeles Lakers* 63 240.8 42.9 88.6 .485 11.2 31.4 .355 31.8 57.1 .556 17.3 23.7 .730 10.6 35.5 46.1 25.9 8.6 6.8 15.1 20.6 114.3
Denver Nuggets 65 242.3 41.8 88.9 .471 10.9 30.4 .358 31.0 58.5 .529 15.9 20.5 .775 10.8 33.5 44.3 26.5 8.1 4.6 13.7 20.0 110.4
San Antonio Spurs 63 242.8 42.0 89.5 .470 10.7 28.7 .371 31.4 60.8 .517 18.4 22.8 .809 8.8 35.6 44.4 24.5 7.2 5.5 12.3 19.2 113.2
Philadelphia 76ers 65 241.2 40.8 87.7 .465 11.4 31.6 .362 29.4 56.1 .523 16.6 22.1 .752 10.4 35.1 45.5 25.9 8.2 5.4 14.2 20.6 109.6
Indiana Pacers 65 241.5 42.2 88.4 .477 10.0 27.5 .363 32.2 60.9 .529 15.1 19.1 .787 8.8 34.0 42.8 25.9 7.2 5.1 13.1 19.6 109.3
Utah Jazz 64 240.4 40.1 84.6 .475 13.2 34.4 .383 27.0 50.2 .537 17.6 22.8 .772 8.8 36.3 45.1 22.2 5.9 4.0 14.9 20.0 111.0
Oklahoma City Thunder 64 241.6 40.3 85.1 .473 10.4 29.3 .355 29.9 55.8 .536 19.8 24.8 .797 8.1 34.6 42.7 21.9 7.6 5.0 13.5 18.8 110.8
Brooklyn Nets 64 243.1 40.0 90.0 .444 12.9 37.9 .340 27.1 52.2 .519 18.0 24.1 .744 10.8 37.6 48.5 24.0 6.5 4.6 15.5 20.7 110.8
Detroit Pistons 66 241.9 39.3 85.7 .459 12.0 32.7 .367 27.3 53.0 .515 16.6 22.4 .743 9.8 32.0 41.7 24.1 7.4 4.5 15.3 19.7 107.2
New York Knicks 66 241.9 40.0 89.3 .447 9.6 28.4 .337 30.4 61.0 .499 16.3 23.5 .694 12.0 34.5 46.5 22.1 7.6 4.7 14.3 22.2 105.8
Sacramento Kings 64 242.3 40.4 87.8 .459 12.6 34.7 .364 27.7 53.2 .522 15.6 20.3 .769 9.6 32.9 42.5 23.4 7.6 4.2 14.4 21.9 109.0
Cleveland Cavaliers 65 241.9 40.3 87.9 .458 11.2 31.8 .351 29.1 56.1 .519 15.1 19.9 .758 10.8 33.4 44.2 23.1 6.9 3.2 16.5 18.3 106.9
Chicago Bulls 65 241.2 39.6 88.6 .447 12.2 35.1 .348 27.4 53.5 .511 15.5 20.5 .755 10.5 31.4 41.9 23.2 10.0 4.1 15.5 21.8 106.8
Orlando Magic 65 240.4 39.2 88.8 .442 10.9 32.0 .341 28.3 56.8 .498 17.0 22.1 .770 10.4 34.2 44.5 24.0 8.4 5.7 12.6 17.6 106.4
Golden State Warriors 65 241.9 38.6 88.2 .438 10.4 31.3 .334 28.2 56.9 .495 18.7 23.2 .803 10.0 32.9 42.8 25.6 8.2 4.6 14.9 20.1 106.3
Charlotte Hornets 65 242.3 37.3 85.9 .434 12.1 34.3 .352 25.2 51.6 .489 16.2 21.6 .748 11.0 31.8 42.8 23.8 6.6 4.1 14.6 18.8 102.9
League Average 65 241.7 40.8 88.8 .460 12.1 33.9 .357 28.7 54.9 .523 17.7 22.9 .771 10.1 34.7 44.9 24.3 7.7 4.9 14.5 20.6 111.4
Upvotes: 1