Reputation: 3
I am using Beautiful Soup to parse an HTML table.
I am running into an issue when trying to use the findAll method to find the columns within my rows. I get an error that says list object has no attribute findAll. I found this method through another post on stack exchange and this was not an issue there. (BeautifulSoup HTML table parsing)
I realize that findAll is a method of BeautifulSoup, not python lists. The weird part is the findAll method works when I find the rows within the table list (I only need the 2nd table on the page), but when I attempt to find the columns in the rows list.
Here's my code:
from urllib.request import URLopener
from bs4 import BeautifulSoup
opener = URLopener() #Open the URL Connection
page = opener.open("http://www.labormarketinfo.edd.ca.gov/majorer/countymajorer.asp?CountyCode=000001") #Open the page
soup = BeautifulSoup(page)
table = soup.findAll('table')[1] #Get the 2nd table (index 1)
rows = table.findAll('tr') #findAll works here
cols = rows.findAll('td') #findAll fails here
print(cols)
Upvotes: 0
Views: 6500
Reputation: 1124928
findAll()
returns a result list, you'd need to loop over those or pick one to get to another contained element with it's own findAll()
method:
table = soup.findAll('table')[1]
rows = table.findAll('tr')
for row in rows:
cols = rows.findAll('td')
print(cols)
or pick one row:
table = soup.findAll('table')[1]
rows = table.findAll('tr')
cols = rows[0].findAll('td') # columns of the *first* row.
print(cols)
Note that findAll
is deprecated, you should use find_all()
instead.
Upvotes: 4