Jacob Garwin
Jacob Garwin

Reputation: 65

Opening and Closing Tags are Removed from html When Using BeautifulSoup

I am running into an issue when using BeautifulSoup to scrape data off of www.basketball-reference.com. I've used BeautifulSoup before on Bballreference before so I am a little stumped as to what is happening (granted I am a pretty huge noob so please bear with me).

I am trying to scrape team season stats off of https://www.basketball-reference.com/leagues/NBA_2020.html and am running into troubles from the very start:

from bs4 import BeautifulSoup
import requests

web_response = requests.get('https://www.basketball-reference.com/leagues/NBA_2020.html').text
soup = BeautifulSoup(web_response, 'lxml')

table = soup.find('table', id='team-stats-per_game')
print(table)

This shows that the finding of the table in question was unsuccessful even though I can clearly locate that tag when inspecting the web page. Okay... no biggie so far (usually these errors are on my end) so I instead just print out the whole soup:

soup = BeautifulSoup(web_response, 'lxml')
print(soup)

I copy and paste that into https://codebeautify.org/htmlviewer/. To get a better view than from the terminal and I see that it does not look how I would expect it to. Essentially the meta tags are fine but everything else appears to have lost its opening and closing tags, just making my soup into an actual soup...

Again, no biggie (still pretty sure it is something that I am doing), so I go and grab the html from a simple blog site, print it, and paste it into codebeautify and lo and behold it looks normal. Now I have a suspicion that something is occurring on basketball-reference's side that is obscuring my ability to even grab the html.

My question is this; what exactly is going on here? I am assuming it's an 80% chance it is still me but the 20% is not so sure at this point. Can someone point out what I am doing wrong or how to grab the html?

Upvotes: 1

Views: 423

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195408

The data is stored within the page, but inside the HTML comment.

To parse it, you can do for example:

import requests
from bs4 import BeautifulSoup, Comment

web_response = requests.get('https://www.basketball-reference.com/leagues/NBA_2020.html').text
soup = BeautifulSoup(web_response, 'lxml')

table = soup.find('table', id='team-stats-per_game')

# find the comment section where the data is stored
for idx, c in enumerate(soup.select_one('div#all_team-stats-per_game').contents):
    if isinstance(c, Comment):
        break

# load the data from comment:
soup2 = BeautifulSoup(soup.select_one('div#all_team-stats-per_game').contents[idx], 'html.parser')

# print data:
for tr in soup2.select('tr:has(td)'):
    tds = tr.select('td')
    for td in tds:
        print(td.get_text(strip=True), end='\t')
    print()

Prints:

Dallas Mavericks    67  241.5   41.6    90.0    .462    15.3    41.5    .369    26.3    48.5    .542    17.9    23.1    .773    10.6    36.4    47.0    24.5    6.3 5.0 12.8    19.0    116.4   
Milwaukee Bucks*    65  240.8   43.5    91.2    .477    13.7    38.6    .356    29.8    52.6    .567    17.8    24.0    .742    9.5 42.2    51.7    25.9    7.4 6.0 14.9    19.2    118.6   
Houston Rockets 64  241.2   41.1    90.7    .454    15.4    44.3    .348    25.7    46.4    .554    20.5    26.0    .787    10.4    34.6    44.9    21.5    8.5 5.1 14.7    21.6    118.1   
Portland Trail Blazers  66  240.8   41.9    90.9    .461    12.6    33.8    .372    29.3    57.1    .513    17.3    21.7    .798    10.1    35.4    45.5    20.2    6.1 6.2 13.0    21.4    113.6   
Atlanta Hawks   67  243.0   40.6    90.6    .449    12.0    36.1    .333    28.6    54.5    .525    18.5    23.4    .790    9.9 33.4    43.3    24.0    7.8 5.1 16.2    23.1    111.8   
New Orleans Pelicans    64  242.3   42.6    92.2    .462    14.0    37.6    .372    28.6    54.6    .525    16.9    23.2    .729    11.2    35.8    47.0    27.0    7.6 5.1 16.2    21.0    116.2   
Los Angeles Clippers    64  241.2   41.6    89.7    .464    12.2    33.2    .366    29.5    56.5    .522    20.8    26.2    .792    11.0    37.0    48.0    23.8    7.1 5.0 14.8    22.0    116.2   
Washington Wizards  64  241.2   41.9    91.0    .461    12.3    33.1    .372    29.6    57.9    .511    19.5    24.8    .787    10.1    31.6    41.7    25.3    8.1 4.3 14.1    22.6    115.6   
Memphis Grizzlies   65  240.4   42.8    91.0    .470    10.9    31.1    .352    31.8    59.9    .531    16.2    21.3    .761    10.4    36.3    46.7    27.0    8.0 5.6 15.3    20.8    112.6   
Phoenix Suns    65  241.2   40.8    87.8    .464    11.2    31.7    .353    29.6    56.1    .527    19.8    24.0    .826    9.8 33.3    43.1    27.2    7.8 4.0 15.1    22.1    112.6   
Miami Heat  65  243.5   39.6    84.4    .470    13.4    34.8    .383    26.3    49.6    .530    19.5    25.1    .778    8.5 36.0    44.5    26.0    7.4 4.5 14.9    20.4    112.2   
Minnesota Timberwolves  64  243.1   40.4    91.6    .441    13.3    39.7    .336    27.1    52.0    .521    19.1    25.4    .753    10.5    34.3    44.8    23.8    8.7 5.7 15.3    21.4    113.3   
Boston Celtics* 64  242.0   41.2    89.6    .459    12.4    34.2    .363    28.8    55.4    .519    18.3    22.8    .801    10.7    35.3    46.0    22.8    8.3 5.6 13.6    21.4    113.0   
Toronto Raptors*    64  241.6   40.6    88.5    .458    13.8    37.0    .371    26.8    51.5    .521    18.1    22.6    .800    9.7 35.5    45.2    25.4    8.8 4.9 14.4    21.5    113.0   
Los Angeles Lakers* 63  240.8   42.9    88.6    .485    11.2    31.4    .355    31.8    57.1    .556    17.3    23.7    .730    10.6    35.5    46.1    25.9    8.6 6.8 15.1    20.6    114.3   
Denver Nuggets  65  242.3   41.8    88.9    .471    10.9    30.4    .358    31.0    58.5    .529    15.9    20.5    .775    10.8    33.5    44.3    26.5    8.1 4.6 13.7    20.0    110.4   
San Antonio Spurs   63  242.8   42.0    89.5    .470    10.7    28.7    .371    31.4    60.8    .517    18.4    22.8    .809    8.8 35.6    44.4    24.5    7.2 5.5 12.3    19.2    113.2   
Philadelphia 76ers  65  241.2   40.8    87.7    .465    11.4    31.6    .362    29.4    56.1    .523    16.6    22.1    .752    10.4    35.1    45.5    25.9    8.2 5.4 14.2    20.6    109.6   
Indiana Pacers  65  241.5   42.2    88.4    .477    10.0    27.5    .363    32.2    60.9    .529    15.1    19.1    .787    8.8 34.0    42.8    25.9    7.2 5.1 13.1    19.6    109.3   
Utah Jazz   64  240.4   40.1    84.6    .475    13.2    34.4    .383    27.0    50.2    .537    17.6    22.8    .772    8.8 36.3    45.1    22.2    5.9 4.0 14.9    20.0    111.0   
Oklahoma City Thunder   64  241.6   40.3    85.1    .473    10.4    29.3    .355    29.9    55.8    .536    19.8    24.8    .797    8.1 34.6    42.7    21.9    7.6 5.0 13.5    18.8    110.8   
Brooklyn Nets   64  243.1   40.0    90.0    .444    12.9    37.9    .340    27.1    52.2    .519    18.0    24.1    .744    10.8    37.6    48.5    24.0    6.5 4.6 15.5    20.7    110.8   
Detroit Pistons 66  241.9   39.3    85.7    .459    12.0    32.7    .367    27.3    53.0    .515    16.6    22.4    .743    9.8 32.0    41.7    24.1    7.4 4.5 15.3    19.7    107.2   
New York Knicks 66  241.9   40.0    89.3    .447    9.6 28.4    .337    30.4    61.0    .499    16.3    23.5    .694    12.0    34.5    46.5    22.1    7.6 4.7 14.3    22.2    105.8   
Sacramento Kings    64  242.3   40.4    87.8    .459    12.6    34.7    .364    27.7    53.2    .522    15.6    20.3    .769    9.6 32.9    42.5    23.4    7.6 4.2 14.4    21.9    109.0   
Cleveland Cavaliers 65  241.9   40.3    87.9    .458    11.2    31.8    .351    29.1    56.1    .519    15.1    19.9    .758    10.8    33.4    44.2    23.1    6.9 3.2 16.5    18.3    106.9   
Chicago Bulls   65  241.2   39.6    88.6    .447    12.2    35.1    .348    27.4    53.5    .511    15.5    20.5    .755    10.5    31.4    41.9    23.2    10.0    4.1 15.5    21.8    106.8   
Orlando Magic   65  240.4   39.2    88.8    .442    10.9    32.0    .341    28.3    56.8    .498    17.0    22.1    .770    10.4    34.2    44.5    24.0    8.4 5.7 12.6    17.6    106.4   
Golden State Warriors   65  241.9   38.6    88.2    .438    10.4    31.3    .334    28.2    56.9    .495    18.7    23.2    .803    10.0    32.9    42.8    25.6    8.2 4.6 14.9    20.1    106.3   
Charlotte Hornets   65  242.3   37.3    85.9    .434    12.1    34.3    .352    25.2    51.6    .489    16.2    21.6    .748    11.0    31.8    42.8    23.8    6.6 4.1 14.6    18.8    102.9   
League Average  65  241.7   40.8    88.8    .460    12.1    33.9    .357    28.7    54.9    .523    17.7    22.9    .771    10.1    34.7    44.9    24.3    7.7 4.9 14.5    20.6    111.4   

Upvotes: 1

Related Questions