user27976
user27976

Reputation: 903

Extracting contents of two tables from web data

How can I retrieve all td information from this html data:

<h1>All staff</h1>
<h2>Manager</h2>
<table class="StaffList">
    <tbody>
        <tr>
            <th>Name</th>
            <th>Post title</th>
            <th>Telephone</th>
            <th>Email</th>
        </tr>
        <tr>
            <td>
                <a href="http://profiles.strx.usc.com/Profile.aspx?Id=Jon.Staut">Jon Staut</a>
            </td>
            <td>Line Manager</td>
            <td>0160 315 3832</td>
            <td>
                <a href="mailto:[email protected]">[email protected]</a> &nbsp;</td>
        </tr>
    </tbody>
</table>
<h2>Junior Staff</h2>
<table class="StaffList">
    <tbody>
        <tr>
            <th>Name</th>
            <th>Post title</th>
            <th>Telephone</th>
            <th>Email</th>
        </tr>
        <tr>
            <td>
                <a href="http://profiles.strx.usc.com/Profile.aspx?Id=Peter.Boone">Peter Boone</a>
            </td>
            <td>Mailer</td>
            <td>0160 315 3834</td>
            <td>
                <a href="mailto:[email protected]">[email protected]&nbsp;</a>
            </td>
        </tr>
        <tr>
            <td>
                <a href="http://profiles.strx.usc.com/Profile.aspx?Id=John.Peters">John Peters</a>
            </td>
            <td>Builder</td>
            <td>0160 315 3837</td>
            <td>
                <a href="mailto:[email protected]">[email protected]</a>
            </td>
        </tr>
    </tbody>
</table>

Here's my code that generated an error:

response =requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
table = soup.findAll('table', attrs={'class': 'StaffList'})

list_of_rows = []
for row in table.findAll('tr'): #2 rows found in table -loop through
    list_of_cells = []
    for cell in row.findAll('td'): # each cell in in a row
        text = cell.text.replace('&nbsp','')
        list_of_cells.append(text)
    #print list_of_cells
    list_of_rows.append(list_of_cells) 
#print all cells in the two rows
print list_of_rows 

Error message:

AttributeError: 'ResultSet' object has no attribute 'findAll'

What do I need to do to make the code output all the information in the two web tables?

Upvotes: 1

Views: 79

Answers (3)

user27976
user27976

Reputation: 903

Thanks for suggestions guys. Problem now solved after replacing 2 lines of code:

The first one:

table = soup.findAll('table', attrs={'class': 'StaffList'})

replaced with:

table = soup.findAll('tr')

The second one:

for row in table.findAll('tr'):

replaced with:

for row in table:

Upvotes: 0

Christos Papoulas
Christos Papoulas

Reputation: 2568

The problem starts at this line:

table = soup.findAll('table', attrs={'class': 'StaffList'})

The findAll returns an array which has no attribute findAll.

Simply, change the findAll to find: table = soup.find('table', attrs={'class': 'StaffList'})

Upvotes: 2

har07
har07

Reputation: 89295

Alternatively, you can use CSS selector expression to return tr elements from the StaffList table without having to extract the table first :

for row in soup.select('table.StaffList tr'): #2 rows found in table -loop through
   ......

Upvotes: 1

Related Questions