ParkerHarrelson1000
ParkerHarrelson1000

Reputation: 1

Beautiful Soup not finding specific table by ID

I am trying to parse a basketball reference player page to extract one of the tables from the page and work with the data from it. For some reason, though, beautiful soup cannot find the table in the page. I have tried to search for other tables in the page and it has successfully found them but for some reason will not find this specific one.

I have the following line which takes a link to the page of the specific player I am searching for and gets the BeautifulSoup version of it:

page_soup = BeautifulSoup(bball_ref_page.content, 'lxml')

I then search for the table with the following line:

table = page_soup.find('table', attrs={'id': 'per_poss'})

Whenever I try to print(table) it just comes out as None. I have also tried searching for the contents by doing:

table = page_soup.find(attrs={'id': 'per_poss'})

same result of None

I have also tried searching for all tables in the page_soup and it returns a list of a bunch of tables not including the one I am looking for

I have tried changing the parse in the page_soup assignment to html.parser and the result remains the same. I have also tried printing the contents of page_soup and can find the table in their:

<div class="table_container current" id="div_per_poss">
        
        <table class="stats_table sortable row_summable" id="per_poss" data-cols-to-freeze="1,3"> <caption>Per 100 Poss Table</caption> <colgroup><col>....

Any ideas what might be causing this to happen?

Upvotes: 0

Views: 385

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195438

The page is storing the <table> data inside the HTML comment <!-- --> so normally BeautifulSoup doesn't see it. To load it as pandas dataframe you can use next example:

import requests
import pandas as pd
from bs4 import BeautifulSoup, Comment


url = "https://www.basketball-reference.com/players/j/jordami01.html"

soup = BeautifulSoup(requests.get(url).content, "lxml")
soup = BeautifulSoup("\n".join(soup.find_all(text=Comment)), "lxml")

df = pd.read_html(str(soup.select_one("table#per_poss")))[0]
print(df.to_markdown())

Prints:

Season Age Tm Lg Pos G GS MP FG FGA FG% 3P 3PA 3P% 2P 2PA 2P% FT FTA FT% ORB DRB TRB AST STL BLK TOV PF PTS Unnamed: 29 ORtg DRtg
0 1984-85 21 CHI NBA SG 82 82 3144 12.9 25 0.515 0.1 0.8 0.173 12.7 24.2 0.526 9.7 11.5 0.845 2.6 5.6 8.2 7.4 3 1.1 4.5 4.4 35.5 nan 118 107
1 1985-86 22 CHI NBA SG 18 7 451 16 35 0.457 0.3 1.9 0.167 15.7 33.1 0.474 11.2 13.3 0.84 2.5 4.4 6.8 5.7 3.9 2.2 4.8 4.9 43.5 nan 109 107
2 1986-87 23 CHI NBA SG 82 82 3281 16.8 34.8 0.482 0.2 1 0.182 16.6 33.8 0.491 12.7 14.8 0.857 2.5 4 6.6 5.8 3.6 1.9 4.2 3.6 46.4 nan 117 104
3 1987-88 24 CHI NBA SG 82 82 3311 16.2 30.3 0.535 0.1 0.8 0.132 16.1 29.5 0.546 11 13.1 0.841 2.1 4.7 6.8 7.4 3.9 2 3.8 4.1 43.6 nan 123 101
4 1988-89 25 CHI NBA SG 81 81 3255 14.7 27.3 0.538 0.4 1.5 0.276 14.3 25.8 0.553 10.2 12.1 0.85 2.3 7.6 9.9 9.9 3.6 1 4.4 3.8 40 nan 123 103
5 1989-90 26 CHI NBA SG 82 82 3197 16 30.5 0.526 1.4 3.8 0.376 14.6 26.7 0.548 9.2 10.8 0.848 2.2 6.6 8.8 8.1 3.5 0.8 3.8 3.7 42.7 nan 123 106
6 1990-91 27 CHI NBA SG 82 82 3034 16.4 30.4 0.539 0.5 1.5 0.312 15.9 28.9 0.551 9.4 11.1 0.851 2 6.2 8.1 7.5 3.7 1.4 3.3 3.8 42.7 nan 125 102
7 1991-92 28 CHI NBA SG 80 80 3102 15.5 29.8 0.519 0.4 1.6 0.27 15 28.2 0.533 8 9.7 0.832 1.5 6.9 8.4 8 3 1.2 3.3 3.3 39.4 nan 121 102
8 1992-93 29 CHI NBA SG 78 78 3067 16.8 33.9 0.495 1.4 3.9 0.352 15.4 30 0.514 8.1 9.6 0.837 2.3 6.5 8.8 7.2 3.7 1 3.5 3.2 43 nan 119 102
9 1994-95 31 CHI NBA SG 17 17 668 13 31.5 0.411 1.2 2.5 0.5 11.7 29 0.403 8.5 10.6 0.801 2 7.2 9.1 7 2.3 1 2.7 3.7 35.7 nan 109 103
10 1995-96 32 CHI NBA SG 82 82 3090 15.6 31.5 0.495 1.9 4.4 0.427 13.7 27.1 0.506 9.3 11.2 0.834 2.5 6.7 9.3 6 3.1 0.7 3.4 3.3 42.5 nan 124 100
11 1996-97 33 CHI NBA SG 82 82 3106 15.8 32.5 0.486 1.9 5.1 0.374 13.9 27.4 0.507 8.2 9.9 0.833 1.9 6.3 8.3 6 2.4 0.8 2.9 2.7 41.8 nan 121 102
12 1997-98 34 CHI NBA SG 82 82 3181 14.9 32.1 0.465 0.5 2.1 0.238 14.4 30 0.482 9.6 12.2 0.784 2.2 5.8 8.1 4.8 2.4 0.8 3.1 2.6 40 nan 114 100
13 2001-02 38 WAS NBA SF 60 53 2093 14.3 34.4 0.416 0.3 1.4 0.189 14 33 0.426 6.8 8.6 0.79 1.3 7.5 8.8 8 2.2 0.7 4.2 3.1 35.7 nan 99 105
14 2002-03 39 WAS NBA SF 82 67 3031 12.2 27.4 0.445 0.3 1 0.291 11.9 26.4 0.45 4.8 5.8 0.821 1.3 7.7 8.9 5.6 2.2 0.7 3.1 3.1 29.5 nan 101 103
15 Career nan nan NBA nan 1072 1039 41011 15.3 30.7 0.497 0.7 2.2 0.327 14.5 28.5 0.51 9.2 11 0.835 2.1 6.3 8.3 7 3.1 1.1 3.7 3.5 40.4 nan 118 103
16 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
17 13 seasons nan CHI NBA nan 930 919 35887 15.5 30.8 0.505 0.8 2.4 0.332 14.8 28.4 0.52 9.6 11.5 0.838 2.2 6.1 8.3 7.1 3.3 1.2 3.7 3.5 41.5 nan 120 103
18 2 seasons nan WAS NBA nan 142 120 5124 13.1 30.3 0.431 0.3 1.1 0.241 12.8 29.1 0.439 5.6 7 0.805 1.3 7.6 8.9 6.6 2.2 0.7 3.6 3.1 32 nan 100 104

To iterate the rows of dataframe, you can use df.iterrows() for example:

for index, row in df.iterrows():
    print(row["Season"], row["Age"])

Prints:

1984-85 21.0
1985-86 22.0
1986-87 23.0
1987-88 24.0
1988-89 25.0

...

Upvotes: 1

Related Questions