n1c9
n1c9

Reputation: 2687

Iterating through an entire table with BeautifulSoup

The code below scrapes the Wikipedia page for currently sitting US senators, which is contained in a table. Currently the code works for giving me the name, party, etc. of the first Senator from Alabama - how can I rework it to iterate through the entire table?

from bs4 import BeautifulSoup
from urllib.request import urlopen

senatorwiki = 'https://en.wikipedia.org/wiki/List_of_current_United_States_Senators'
html = urlopen(senatorwiki)
soup = BeautifulSoup(html.read(), "lxml")

senatortable = soup.find('table',{'class':"sortable"})
td = senatortable.find('td')
state = td.find_next()
ns = state.find_next_sibling()
picture = ns.find_next_sibling()
name = picture.find_next_sibling()
party = name.find_next_sibling()
privsec = party.find_next_sibling()
print(state.text,ns.text,name.text,party.text,privsec.text)

Upvotes: 2

Views: 1408

Answers (1)

rebeling
rebeling

Reputation: 728

To iterate the table findAll tr and then all the td in there. Beware I am using request, not only because it is awesome, also urllib has no request in python2.7.

from bs4 import BeautifulSoup
import requests

senatorwiki = 'https://en.wikipedia.org/wiki/List_of_current_United_States_Senators'
html = requests.get(senatorwiki)
soup = BeautifulSoup(html.text, "lxml")
senatortable = soup.find('table',{'class':"sortable"})
rows = senatortable.findAll('tr')

for tr in rows:
    print tr.findAll('td')
    # to get next lines data of the list of tds is up to you ;)
    # print(state.text,ns.text,name.text,party.text,privsec.text)

Upvotes: 1

Related Questions