Mridang Agarwalla
Mridang Agarwalla

Reputation: 45128

Unable to get correct link in BeautifulSoup

I'm trying to parse a bit of HTML and I'd like to extract the link that matches a particular pattern. I'm using the find method with a regular expression but it doesn't get me the correct link. Here's my snippet. Could someone tell me what I'm doing wrong?

from BeautifulSoup import BeautifulSoup
import re

html = """
<div class="entry">
    <a target="_blank" href="http://www.rottentomatoes.com/m/diary_of_a_wimpy_kid/">RT</a>
    <a target="_blank" href="http://www.imdb.com/video/imdb/vi2496267289/">Trailer</a> &ndash; 
    <a target="_blank" href="http://www.imdb.com/title/tt1196141/">IMDB</a> &ndash; 
</div>
"""

soup = BeautifulSoup(html)
print soup.find('a', href = re.compile(r".*title/tt.*"))['href']

I should be getting the second link but BS always returns the first link. The href of the first link doesn't even match my regex so why does it return it?

Thanks.

Upvotes: 3

Views: 760

Answers (2)

Katriel
Katriel

Reputation: 123782

find only returns the first <a> tag. You want findAll.

Upvotes: 2

miku
miku

Reputation: 188224

Can't answer your question, but anyway your (originally) posted code has an import typo. Change

import BeautifulSoup

to

from BeautifulSoup import BeautifulSoup

Then, your output (using beautifulsoup version 3.1.0.1) will be:

http://www.imdb.com/title/tt1196141/

Upvotes: 0

Related Questions