How to iterate over tags using BS4?

Question

Running into some problems using BS4 to extract specific elements. This is taken from the Texas Department of Corrections Executed Inmates page.

I've attached a screenshot for better understanding.

Within each tr tag, there are multiple td tags containing text about First Name, Last Name, TDCJ Number, Age, Date, etc.

How can I get BS4 to skip over the first tr tag (the first tr tag are the column names) and for each subsequent tr tag, extract the text from the td tags?

from urllib.request import urlopen
from bs4 import BeautifulSoup
import csv

def main():
    gettabledata()

lstofinmates = list()

def gettabledata():
    with urlopen('https://www.tdcj.state.tx.us/death_row/dr_executed_offenders.html') as response:

    soup = BeautifulSoup(response, 'html.parser')

    with open('exinmates.csv', 'w', newline='') as output_file:
        inmate_file_writer = csv.DictWriter(output_file,
                                           fieldnames=['First Name', 'Last Name', 'Execution Number',
                                                       'Last Statement', 'TDCJ Number', 'Age', 'Date Executed', 'Race',
                                                       'County'],
                                           extrasaction='ignore',
                                           delimiter=',', quotechar='"')
        inmate_file_writer.writeheader()
        table = soup.find('table').find('tbody')
        print (table)

if __name__ == '__main__':
    main()

I'm thinking of creating of LOD structure where each dictionary corresponds to an inmate information, and the text from the td fields are pushed into the dictionary, and each dictionary is appended into a list. The problem is that I can't find a way to skip the first tr tag and how to iterate over the rest of the tr tags to append them into a dictionary. Any suggestions/help? Thanks!

poke · Accepted Answer

Here is something to get you started:

from bs4 import BeautifulSoup
html = '''Executed Offenders

  
      Execution Link Link Last Name First Name TDCJ Number Age Date Race County
      
542 Offender Information Last Statement Bigby James 997 61 3/14/2017 White Tarrant
      541 Offender Information Last Statement Ruiz Rolando 999145 44 3/07/2017 Hispanic Bexar
      540 Offender Information Last Statement Edwards Terry 999463 43 1/26/2017 Black Dallas
      539 Offender Information Last Statement Wilkins Christopher 999533 48 01/11/2017 White Tarrant
      538 Offender Information Last Statement Fuller Barney 999481 58 10/05/2016 White Houston
  
'''

soup = BeautifulSoup(html, 'html.parser')
rows = iter(soup.find('table').find_all('tr'))

# skip first row
next(rows)

for row in rows:
    for cell in row.find_all('td'):
        print(cell)
    print()

How to iterate over <td> tags using BS4?

Answers (1)

Related Questions

How to iterate over &lt;td&gt; tags using BS4?

Answers (1)

Related Questions

How to iterate over <td> tags using BS4?