How to skip over certain rows in table when web scraping

Question

I'm scraping from this link: https://www.pro-football-reference.com/boxscores/201809060phi.htm

My code is as follows:

import requests
from bs4 import BeautifulSoup

# assign url
url = 'https://www.pro-football-reference.com/boxscores/201809060phi.htm'

#parse and format url
r = requests.get(url).text
res = r.replace("","")
soup = BeautifulSoup(res, 'lxml')


#get tables
tables = soup.findAll("div",{"class":"table_outer_container"})

#get offense_stats table
offense_table = tables[5]
rows = offense_table.tbody.findAll("tr")

#here i want to iterate through the player rows and pull their stats

player = test_row.find("th",{"data-stat":"player"}).text
carries = test_row.find("td",{"data-stat":"rush_att"}).text
rush_yds = test_row.find("td",{"data-stat":"rush_yds"}).text
rush_tds = test_row.find("td",{"data-stat":"rush_td"}).text
targets = test_row.find("td",{"data-stat":"targets"}).text
recs = test_row.find("td",{"data-stat":"rec"}).text
rec_yds= test_row.find("td",{"data-stat":"rec_yds"}).text
rec_tds= test_row.find("td",{"data-stat":"rec_td"}).text

The table on the page that I need (offensive stats) has the stats for all the players in the game. I want to iterate through the rows pulling the stats for each player. Problem is that there are two rows in the middle that are headers and not player stats. My "rows" variable pulled all "tr" elements in the "tbody" of my "offense_table" variable. This includes the two header rows that I do not want. They would be rows[8] and rows[9] in this particular case, but that could be different from game to game.

#this is how the data rows begin (the ones I want)


#and this is how the header rows begin (the ones I want to skip over)

Anybody know a way for me to ignore these rows when iterating through?

Danil · Accepted Answer

To select only tr without class try to replace

rows = offense_table.tbody.findAll("tr")

by

rows = offense_table.findAll("tr", attrs={'class': None})

How to skip over certain rows in table when web scraping

Answers (2)

Related Questions