Reputation: 903
How can I retrieve all td
information from this html data:
<h1>All staff</h1>
<h2>Manager</h2>
<table class="StaffList">
<tbody>
<tr>
<th>Name</th>
<th>Post title</th>
<th>Telephone</th>
<th>Email</th>
</tr>
<tr>
<td>
<a href="http://profiles.strx.usc.com/Profile.aspx?Id=Jon.Staut">Jon Staut</a>
</td>
<td>Line Manager</td>
<td>0160 315 3832</td>
<td>
<a href="mailto:[email protected]">[email protected]</a> </td>
</tr>
</tbody>
</table>
<h2>Junior Staff</h2>
<table class="StaffList">
<tbody>
<tr>
<th>Name</th>
<th>Post title</th>
<th>Telephone</th>
<th>Email</th>
</tr>
<tr>
<td>
<a href="http://profiles.strx.usc.com/Profile.aspx?Id=Peter.Boone">Peter Boone</a>
</td>
<td>Mailer</td>
<td>0160 315 3834</td>
<td>
<a href="mailto:[email protected]">[email protected] </a>
</td>
</tr>
<tr>
<td>
<a href="http://profiles.strx.usc.com/Profile.aspx?Id=John.Peters">John Peters</a>
</td>
<td>Builder</td>
<td>0160 315 3837</td>
<td>
<a href="mailto:[email protected]">[email protected]</a>
</td>
</tr>
</tbody>
</table>
Here's my code that generated an error:
response =requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
table = soup.findAll('table', attrs={'class': 'StaffList'})
list_of_rows = []
for row in table.findAll('tr'): #2 rows found in table -loop through
list_of_cells = []
for cell in row.findAll('td'): # each cell in in a row
text = cell.text.replace(' ','')
list_of_cells.append(text)
#print list_of_cells
list_of_rows.append(list_of_cells)
#print all cells in the two rows
print list_of_rows
Error message:
AttributeError: 'ResultSet' object has no attribute 'findAll'
What do I need to do to make the code output all the information in the two web tables?
Upvotes: 1
Views: 79
Reputation: 903
Thanks for suggestions guys. Problem now solved after replacing 2 lines of code:
The first one:
table = soup.findAll('table', attrs={'class': 'StaffList'})
replaced with:
table = soup.findAll('tr')
The second one:
for row in table.findAll('tr'):
replaced with:
for row in table:
Upvotes: 0
Reputation: 2568
The problem starts at this line:
table = soup.findAll('table', attrs={'class': 'StaffList'})
The findAll
returns an array which has no attribute findAll
.
Simply, change the findAll
to find
:
table = soup.find('table', attrs={'class': 'StaffList'})
Upvotes: 2
Reputation: 89295
Alternatively, you can use CSS selector expression to return tr
elements from the StaffList
table without having to extract the table
first :
for row in soup.select('table.StaffList tr'): #2 rows found in table -loop through
......
Upvotes: 1