Reputation: 27
I want to pull the entire table of 2018 NFL fantasy football statistics, the below code is able to do this but I am running into this error: 'NoneType' object has no attribute 'a'.
I figured out this is occurring because there are rows that repeat the header names every 30 rows. These rows do not contain the 'a' tag all the other rows contain but they are a different class, class = "thead". I found a similar problem from a few years ago but am having trouble adapting the solution to my code. Would appreciate any help!
from bs4 import BeautifulSoup
url = 'https://www.pro-football-reference.com'
year = 2018
r = requests.get(url + '/years/' + str(year) + '/fantasy.htm')
soup = BeautifulSoup(r.content, 'html.parser')
parsed_table = soup.find_all('table')[0]
# first 2 rows are col headers so skip them with [2:]
for i,row in enumerate(parsed_table.find_all('tr')[2:]):
print(i)
dat = row.find('td', attrs={'data-stat': 'player'})
name = dat.a.get_text()
stub = dat.a.get('href')
Upvotes: 0
Views: 607
Reputation: 28565
Just need a bit of logic. There's a number of ways to do that, to check if there is an <a>
tag.
What I did was simply add if dat:
. Since dat = row.find('td', attrs={'data-stat': 'player'})
, if it doesn't return anything, it'll come back False
, and thus not look to get the <a>
tag.
Also just as a note, since you are grabbing the first <table>
tag (Ie soup.find_all('table')[0]
), you can simply use .find()
as that will find and return the first instance it finds.
from bs4 import BeautifulSoup
import requests
url = 'https://www.pro-football-reference.com'
year = 2018
r = requests.get(f'{url}/years/{year}/fantasy.htm')
soup = BeautifulSoup(r.content, 'html.parser')
parsed_table = soup.find('table')
# first 2 rows are col headers so skip them with [2:]
for i,row in enumerate(parsed_table.find_all('tr')[2:]):
print(i)
dat = row.find('td', attrs={'data-stat': 'player'})
if dat:
name = dat.a.get_text()
stub = dat.a.get('href')
Upvotes: 1