Reputation: 1
I am trying to parse a basketball reference player page to extract one of the tables from the page and work with the data from it. For some reason, though, beautiful soup cannot find the table in the page. I have tried to search for other tables in the page and it has successfully found them but for some reason will not find this specific one.
I have the following line which takes a link to the page of the specific player I am searching for and gets the BeautifulSoup version of it:
page_soup = BeautifulSoup(bball_ref_page.content, 'lxml')
I then search for the table with the following line:
table = page_soup.find('table', attrs={'id': 'per_poss'})
Whenever I try to print(table)
it just comes out as None.
I have also tried searching for the contents by doing:
table = page_soup.find(attrs={'id': 'per_poss'})
same result of None
I have also tried searching for all tables in the page_soup
and it returns a list of a bunch of tables not including the one I am looking for
I have tried changing the parse in the page_soup
assignment to html.parser
and the result remains the same. I have also tried printing the contents of page_soup
and can find the table in their:
<div class="table_container current" id="div_per_poss">
<table class="stats_table sortable row_summable" id="per_poss" data-cols-to-freeze="1,3"> <caption>Per 100 Poss Table</caption> <colgroup><col>....
Any ideas what might be causing this to happen?
Upvotes: 0
Views: 385
Reputation: 195438
The page is storing the <table>
data inside the HTML comment <!-- -->
so normally BeautifulSoup doesn't see it. To load it as pandas dataframe you can use next example:
import requests
import pandas as pd
from bs4 import BeautifulSoup, Comment
url = "https://www.basketball-reference.com/players/j/jordami01.html"
soup = BeautifulSoup(requests.get(url).content, "lxml")
soup = BeautifulSoup("\n".join(soup.find_all(text=Comment)), "lxml")
df = pd.read_html(str(soup.select_one("table#per_poss")))[0]
print(df.to_markdown())
Prints:
Season | Age | Tm | Lg | Pos | G | GS | MP | FG | FGA | FG% | 3P | 3PA | 3P% | 2P | 2PA | 2P% | FT | FTA | FT% | ORB | DRB | TRB | AST | STL | BLK | TOV | PF | PTS | Unnamed: 29 | ORtg | DRtg | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1984-85 | 21 | CHI | NBA | SG | 82 | 82 | 3144 | 12.9 | 25 | 0.515 | 0.1 | 0.8 | 0.173 | 12.7 | 24.2 | 0.526 | 9.7 | 11.5 | 0.845 | 2.6 | 5.6 | 8.2 | 7.4 | 3 | 1.1 | 4.5 | 4.4 | 35.5 | nan | 118 | 107 |
1 | 1985-86 | 22 | CHI | NBA | SG | 18 | 7 | 451 | 16 | 35 | 0.457 | 0.3 | 1.9 | 0.167 | 15.7 | 33.1 | 0.474 | 11.2 | 13.3 | 0.84 | 2.5 | 4.4 | 6.8 | 5.7 | 3.9 | 2.2 | 4.8 | 4.9 | 43.5 | nan | 109 | 107 |
2 | 1986-87 | 23 | CHI | NBA | SG | 82 | 82 | 3281 | 16.8 | 34.8 | 0.482 | 0.2 | 1 | 0.182 | 16.6 | 33.8 | 0.491 | 12.7 | 14.8 | 0.857 | 2.5 | 4 | 6.6 | 5.8 | 3.6 | 1.9 | 4.2 | 3.6 | 46.4 | nan | 117 | 104 |
3 | 1987-88 | 24 | CHI | NBA | SG | 82 | 82 | 3311 | 16.2 | 30.3 | 0.535 | 0.1 | 0.8 | 0.132 | 16.1 | 29.5 | 0.546 | 11 | 13.1 | 0.841 | 2.1 | 4.7 | 6.8 | 7.4 | 3.9 | 2 | 3.8 | 4.1 | 43.6 | nan | 123 | 101 |
4 | 1988-89 | 25 | CHI | NBA | SG | 81 | 81 | 3255 | 14.7 | 27.3 | 0.538 | 0.4 | 1.5 | 0.276 | 14.3 | 25.8 | 0.553 | 10.2 | 12.1 | 0.85 | 2.3 | 7.6 | 9.9 | 9.9 | 3.6 | 1 | 4.4 | 3.8 | 40 | nan | 123 | 103 |
5 | 1989-90 | 26 | CHI | NBA | SG | 82 | 82 | 3197 | 16 | 30.5 | 0.526 | 1.4 | 3.8 | 0.376 | 14.6 | 26.7 | 0.548 | 9.2 | 10.8 | 0.848 | 2.2 | 6.6 | 8.8 | 8.1 | 3.5 | 0.8 | 3.8 | 3.7 | 42.7 | nan | 123 | 106 |
6 | 1990-91 | 27 | CHI | NBA | SG | 82 | 82 | 3034 | 16.4 | 30.4 | 0.539 | 0.5 | 1.5 | 0.312 | 15.9 | 28.9 | 0.551 | 9.4 | 11.1 | 0.851 | 2 | 6.2 | 8.1 | 7.5 | 3.7 | 1.4 | 3.3 | 3.8 | 42.7 | nan | 125 | 102 |
7 | 1991-92 | 28 | CHI | NBA | SG | 80 | 80 | 3102 | 15.5 | 29.8 | 0.519 | 0.4 | 1.6 | 0.27 | 15 | 28.2 | 0.533 | 8 | 9.7 | 0.832 | 1.5 | 6.9 | 8.4 | 8 | 3 | 1.2 | 3.3 | 3.3 | 39.4 | nan | 121 | 102 |
8 | 1992-93 | 29 | CHI | NBA | SG | 78 | 78 | 3067 | 16.8 | 33.9 | 0.495 | 1.4 | 3.9 | 0.352 | 15.4 | 30 | 0.514 | 8.1 | 9.6 | 0.837 | 2.3 | 6.5 | 8.8 | 7.2 | 3.7 | 1 | 3.5 | 3.2 | 43 | nan | 119 | 102 |
9 | 1994-95 | 31 | CHI | NBA | SG | 17 | 17 | 668 | 13 | 31.5 | 0.411 | 1.2 | 2.5 | 0.5 | 11.7 | 29 | 0.403 | 8.5 | 10.6 | 0.801 | 2 | 7.2 | 9.1 | 7 | 2.3 | 1 | 2.7 | 3.7 | 35.7 | nan | 109 | 103 |
10 | 1995-96 | 32 | CHI | NBA | SG | 82 | 82 | 3090 | 15.6 | 31.5 | 0.495 | 1.9 | 4.4 | 0.427 | 13.7 | 27.1 | 0.506 | 9.3 | 11.2 | 0.834 | 2.5 | 6.7 | 9.3 | 6 | 3.1 | 0.7 | 3.4 | 3.3 | 42.5 | nan | 124 | 100 |
11 | 1996-97 | 33 | CHI | NBA | SG | 82 | 82 | 3106 | 15.8 | 32.5 | 0.486 | 1.9 | 5.1 | 0.374 | 13.9 | 27.4 | 0.507 | 8.2 | 9.9 | 0.833 | 1.9 | 6.3 | 8.3 | 6 | 2.4 | 0.8 | 2.9 | 2.7 | 41.8 | nan | 121 | 102 |
12 | 1997-98 | 34 | CHI | NBA | SG | 82 | 82 | 3181 | 14.9 | 32.1 | 0.465 | 0.5 | 2.1 | 0.238 | 14.4 | 30 | 0.482 | 9.6 | 12.2 | 0.784 | 2.2 | 5.8 | 8.1 | 4.8 | 2.4 | 0.8 | 3.1 | 2.6 | 40 | nan | 114 | 100 |
13 | 2001-02 | 38 | WAS | NBA | SF | 60 | 53 | 2093 | 14.3 | 34.4 | 0.416 | 0.3 | 1.4 | 0.189 | 14 | 33 | 0.426 | 6.8 | 8.6 | 0.79 | 1.3 | 7.5 | 8.8 | 8 | 2.2 | 0.7 | 4.2 | 3.1 | 35.7 | nan | 99 | 105 |
14 | 2002-03 | 39 | WAS | NBA | SF | 82 | 67 | 3031 | 12.2 | 27.4 | 0.445 | 0.3 | 1 | 0.291 | 11.9 | 26.4 | 0.45 | 4.8 | 5.8 | 0.821 | 1.3 | 7.7 | 8.9 | 5.6 | 2.2 | 0.7 | 3.1 | 3.1 | 29.5 | nan | 101 | 103 |
15 | Career | nan | nan | NBA | nan | 1072 | 1039 | 41011 | 15.3 | 30.7 | 0.497 | 0.7 | 2.2 | 0.327 | 14.5 | 28.5 | 0.51 | 9.2 | 11 | 0.835 | 2.1 | 6.3 | 8.3 | 7 | 3.1 | 1.1 | 3.7 | 3.5 | 40.4 | nan | 118 | 103 |
16 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan |
17 | 13 seasons | nan | CHI | NBA | nan | 930 | 919 | 35887 | 15.5 | 30.8 | 0.505 | 0.8 | 2.4 | 0.332 | 14.8 | 28.4 | 0.52 | 9.6 | 11.5 | 0.838 | 2.2 | 6.1 | 8.3 | 7.1 | 3.3 | 1.2 | 3.7 | 3.5 | 41.5 | nan | 120 | 103 |
18 | 2 seasons | nan | WAS | NBA | nan | 142 | 120 | 5124 | 13.1 | 30.3 | 0.431 | 0.3 | 1.1 | 0.241 | 12.8 | 29.1 | 0.439 | 5.6 | 7 | 0.805 | 1.3 | 7.6 | 8.9 | 6.6 | 2.2 | 0.7 | 3.6 | 3.1 | 32 | nan | 100 | 104 |
To iterate the rows of dataframe, you can use df.iterrows()
for example:
for index, row in df.iterrows():
print(row["Season"], row["Age"])
Prints:
1984-85 21.0
1985-86 22.0
1986-87 23.0
1987-88 24.0
1988-89 25.0
...
Upvotes: 1