Reputation: 41
There is a part of the html code of the page I parse:
<td>
<a class="soup" href="link">1</a>
</td>
<td>
<a class="soup" href="link">2</a>
<br>
<img src="/any.gif">
</br>
</td>
<td>
<a class="soup" href="link">3</a>
</td>
<td>
<a class="soup" href="link">4</a>
<br>
<img src="/any.gif">
</br>
</td>
<td>
<a class="soup" href="link">5</a>
</td>
Question: How to get all only those td that contain br and img?
UPD: i try to use soup.find('img', {'src': '/any.gif'}).findPreviousSibling('a')
But he finds only one <a>
(The main goal is to get all only <a>
next to them <br><img></br>
)
Upvotes: 0
Views: 1290
Reputation: 1559
just small improvement to your code
for img in soup.find_all('img', {'src': '/any.gif'}):
if img.findPreviousSibling('a') is not None:
a=img.findPreviousSibling('a')
print a['href']
Upvotes: 2