codextrmz
codextrmz

Reputation: 27

Skip Certain Rows in Table Using BeautifulSoup

I want to pull the entire table of 2018 NFL fantasy football statistics, the below code is able to do this but I am running into this error: 'NoneType' object has no attribute 'a'.

I figured out this is occurring because there are rows that repeat the header names every 30 rows. These rows do not contain the 'a' tag all the other rows contain but they are a different class, class = "thead". I found a similar problem from a few years ago but am having trouble adapting the solution to my code. Would appreciate any help!

from bs4 import BeautifulSoup

url = 'https://www.pro-football-reference.com'
year = 2018

r = requests.get(url + '/years/' + str(year) + '/fantasy.htm')
soup = BeautifulSoup(r.content, 'html.parser')
parsed_table = soup.find_all('table')[0]  
# first 2 rows are col headers so skip them with [2:]
for i,row in enumerate(parsed_table.find_all('tr')[2:]):
    print(i)
    dat = row.find('td', attrs={'data-stat': 'player'})
    name = dat.a.get_text()
    stub = dat.a.get('href')

Upvotes: 0

Views: 607

Answers (1)

chitown88
chitown88

Reputation: 28565

Just need a bit of logic. There's a number of ways to do that, to check if there is an <a> tag.

What I did was simply add if dat:. Since dat = row.find('td', attrs={'data-stat': 'player'}), if it doesn't return anything, it'll come back False, and thus not look to get the <a> tag.

Also just as a note, since you are grabbing the first <table> tag (Ie soup.find_all('table')[0]), you can simply use .find() as that will find and return the first instance it finds.

from bs4 import BeautifulSoup
import requests

url = 'https://www.pro-football-reference.com'
year = 2018

r = requests.get(f'{url}/years/{year}/fantasy.htm')
soup = BeautifulSoup(r.content, 'html.parser')
parsed_table = soup.find('table')
# first 2 rows are col headers so skip them with [2:]
for i,row in enumerate(parsed_table.find_all('tr')[2:]):
    print(i)
    dat = row.find('td', attrs={'data-stat': 'player'})
    if dat:
        name = dat.a.get_text()
        stub = dat.a.get('href')

Upvotes: 1

Related Questions