CBW
CBW

Reputation: 35

How to remove headers that show up in table rows?

I'm looking for a way to remove all the duplicate headers with the html class "thead" that are showing up in the table rows. Here is the code I have before I run into my problem:

for yr in years:  
    try:  
        url = 'https://www.pro-football-reference.com/years/' + yr + '/passing.htm'
        html = urlopen(url)

        soup = BeautifulSoup(html, "lxml") 
        column_headers = [th.getText() for th in soup.findAll('tr', limit=2)[0].findAll('th')]
        table_rows = soup.select("#passing tr")[1:]

Upvotes: 2

Views: 1382

Answers (1)

Keyur Potdar
Keyur Potdar

Reputation: 7238

As the tags you want don't have any class, and the one's you don't want have the following tag:

<tr class="thead">

you can simply use this to get all the rows you want:

table_rows = soup.find('table', id='passing').find_all('tr', class_=None)[1:]

Using class_=None will skip all the tags that have any class name.

Upvotes: 2

Related Questions