Reputation: 1962
I am using BeautifulSoup to parse an html page. I need to work on the first table in the page. That table contains a few rows. Each row then contains some 'td' tags and one of the 'td' tags has an 'img' tag. I want to get all the information in that table. But if I print that table I don't get any data related to the 'img' tag.
I am using soap.findAll("table") to get all the tables then chose the first table for processing. The html looks something like this:
<table id="abc"
<tr class="listitem-even">
<td class="listitem-even">
<table border = "0"> <tr> <td class="gridcell">
<img id="img_id" title="img_title" src="img_src" alt="img_alt" /> </td> </tr>
</table>
</td>
<td class="listitem-even"
<span>some_other_information</span>
</td>
</tr>
</table>
How can I get all the data in the table including the 'img' tag ? Thanks,
Upvotes: 0
Views: 2560
Reputation: 871
You have a nested table, so you need to check where you are in the tree, prior to parsing tr/td/img tags.
from bs4 import BeautifulSoup
f = open('test.html', 'rb')
html = f.read()
f.close()
soup = BeautifulSoup(html)
tables = soup.find_all('table')
for table in tables:
if table.find_parent("table") is not None:
for tr in table.find_all('tr'):
for td in table.find_all('td'):
for img in td.find_all('img'):
print img['id']
print img['src']
print img['title']
print img['alt']
It returns the following based on your example:
img_id
img_src
img_title
img_alt
Upvotes: 3