Skip Certain Rows in Table Using BeautifulSoup

Question

I want to pull the entire table of 2018 NFL fantasy football statistics, the below code is able to do this but I am running into this error: 'NoneType' object has no attribute 'a'.

I figured out this is occurring because there are rows that repeat the header names every 30 rows. These rows do not contain the 'a' tag all the other rows contain but they are a different class, class = "thead". I found a similar problem from a few years ago but am having trouble adapting the solution to my code. Would appreciate any help!

from bs4 import BeautifulSoup

url = 'https://www.pro-football-reference.com'
year = 2018

r = requests.get(url + '/years/' + str(year) + '/fantasy.htm')
soup = BeautifulSoup(r.content, 'html.parser')
parsed_table = soup.find_all('table')[0]  
# first 2 rows are col headers so skip them with [2:]
for i,row in enumerate(parsed_table.find_all('tr')[2:]):
    print(i)
    dat = row.find('td', attrs={'data-stat': 'player'})
    name = dat.a.get_text()
    stub = dat.a.get('href')

chitown88 · Accepted Answer

Just need a bit of logic. There's a number of ways to do that, to check if there is an tag.

What I did was simply add if dat:. Since dat = row.find('td', attrs={'data-stat': 'player'}), if it doesn't return anything, it'll come back False, and thus not look to get the tag.

Also just as a note, since you are grabbing the first

tag (Ie soup.find_all('table')[0]), you can simply use .find() as that will find and return the first instance it finds.

from bs4 import BeautifulSoup
import requests

url = 'https://www.pro-football-reference.com'
year = 2018

r = requests.get(f'{url}/years/{year}/fantasy.htm')
soup = BeautifulSoup(r.content, 'html.parser')
parsed_table = soup.find('table')
# first 2 rows are col headers so skip them with [2:]
for i,row in enumerate(parsed_table.find_all('tr')[2:]):
    print(i)
    dat = row.find('td', attrs={'data-stat': 'player'})
    if dat:
        name = dat.a.get_text()
        stub = dat.a.get('href')

Skip Certain Rows in Table Using BeautifulSoup

Answers (1)

Related Questions