colicab
colicab

Reputation: 65

How to get href out of certain html class using Python BeautifulSoup

This is a piece of my HTML code named soup. It's already a BeautifulSoup object

<center>

<!--[if lt IE 7]>
 <style type="text/css">
 div, img { behavior: url(http://www.addic7ed.com/js/iepngfix.htc) }
 </style>
<![endif]-->
<br /><center>
<!--Iframe Tag  -->

<!-- begin ZEDO for channel:  Addic7ed 728x90 , publisher: Addic7ed , Ad Dimension: Super Banner - 728 x 90 -->

<iframe src="http://d2.zedo.com/jsc/d2/ff2.html?n=2051;c=59;s=22;d=14;w=728;h=90" frameborder="0" marginheight="0" marginwidth="0" scrolling="no" allowtransparency="true" width="728" height="90"></iframe>

<!-- end ZEDO for channel:  Addic7ed 728x90 , publisher: Addic7ed , Ad Dimension: Super Banner - 728 x 90 -->
</center><br /><div id="container"> 
        <table class="tabel70" border="0"><tr><!-- table header --><td class="tablecorner"><img src="http://www.addic7ed.com/images/tl.gif" /></td>
                <td></td>
                <td class="tablecorner"><img src="http://www.addic7ed.com/images/tr.gif" /></td>
            </tr><tr><td></td>
                <td>
<form action="/search.php" method="get">
<div align="center">
<input name="search" type="text" id="search" size="50" value="nikita 03x02" class="inputCool" />&#160;
 <input name="Submit" type="submit" class="coolBoton" value="Search" /><br /><b>1 results found</b> </div><br /><center><br /><form action="https://www.paypal.com/cgi-bin/webscr" method="post">
    <input type="hidden" name="cmd" value="_s-xclick" /><input type="hidden" name="hosted_button_id" value="EC7EPAVR5MXV6" /><input type="image" src="https://www.paypal.com/en_US/i/btn/btn_donateCC_LG.gif" border="0" name="submit" alt="PayPal - The safer, easier way to pay online!" /><img alt="" border="0" src="https://www.paypal.com/en_US/i/scr/pixel.gif" width="1" height="1" /></form> <br /></center>
<br /><center>
<!--Iframe Tag  -->

<!-- begin ZEDO for channel:  Addic7ed 728x90 , publisher: Addic7ed , Ad Dimension: Super Banner - 728 x 90 -->

<iframe src="http://d2.zedo.com/jsc/d2/ff2.html?n=2051;c=59;s=22;d=14;w=728;h=90" frameborder="0" marginheight="0" marginwidth="0" scrolling="no" allowtransparency="true" width="728" height="90"></iframe>

<!-- end ZEDO for channel:  Addic7ed 728x90 , publisher: Addic7ed , Ad Dimension: Super Banner - 728 x 90 -->
</center>
<br /><table class="tabel" align="center" width="80%" border="0"><tr><td><img src="images/television.png" /></td><td><a href="serie/Nikita/3/2/Innocence" debug="68217">Nikita - 03x02 - Innocence</a></td></tr><tr><p>
</p><p>
</p></tr></table></form></td>
                <td></td>
            </tr><tr><!-- table footer --><td class="tablecorner"><img src="http://www.addic7ed.com/images/bl.gif" /></td>
                <td></td>
                <td class="tablecorner"><img src="http://www.addic7ed.com/images/br.gif" /></td>
            </tr></table></div>

I would like to get the href (ie "serie/Nikita/3/2/Innocence") from the class=tabel using BeautifulSoup and python

For the moment I can extract it using

soup.find(attrs = {'class':'tabel'}).find('a')['href']

But this seems a little convoluted. Is there a more simple (pyhonic) way to get this url?

Cheers

Upvotes: 0

Views: 1827

Answers (1)

Arovit
Arovit

Reputation: 3699

Try this -

page = urllib2.urlopen(url).read()
link_pat = SoupStrainer('a')
links = BeautifulSoup(page, parseOnlyThese=link_pat)
for link in links:
    url = link['href'].strip('/')

Upvotes: 2

Related Questions