Reputation: 35
I'm looking for a way to remove all the duplicate headers with the html class "thead" that are showing up in the table rows. Here is the code I have before I run into my problem:
for yr in years:
try:
url = 'https://www.pro-football-reference.com/years/' + yr + '/passing.htm'
html = urlopen(url)
soup = BeautifulSoup(html, "lxml")
column_headers = [th.getText() for th in soup.findAll('tr', limit=2)[0].findAll('th')]
table_rows = soup.select("#passing tr")[1:]
Upvotes: 2
Views: 1382
Reputation: 7238
As the tags you want don't have any class, and the one's you don't want have the following tag:
<tr class="thead">
you can simply use this to get all the rows you want:
table_rows = soup.find('table', id='passing').find_all('tr', class_=None)[1:]
Using class_=None
will skip all the tags that have any class name.
Upvotes: 2