Extracting links with href attribute in Python BeautifulSoup

Question

I have a simple task to extract links from html (url). I do this:

> #!/usr/bin/python
> 
> import urllib import webbrowser from bs4 import BeautifulSoup
> 
> URL = "http://54.75.225.110/quiz" URL_end = "/question"
> 
> LINK = URL + URL_end file =
> urllib.urlopen("http://54.75.225.110/quiz/question") soup =
> BeautifulSoup(file)
> 
> for item in soup.find_all(href=True):
>     print item
> 
> 
> print 'Hey there!'

and this is the html:

>        href="http://54.75.225.110/quiz/answer/56595">this page (be
> quick).

Any idea why everything my script returns is: "Hey there!"? If I modify my code to:

for item in soup.find_all('a'): print item

All I get is:

> this
> page

Why, where is "href" attribute?

PepperoniPizza · Accepted Answer

I tested you HTML code using BeautifulSoup 4:

from bs4 import BeautifulSoup

soup = BeautifulSoup(html)

for a in soup.find_all('a'):
    if 'href' in a.attrs:
        print a['href']


http://54.75.225.110/quiz/answer/56595

Extracting links with href attribute in Python BeautifulSoup

Answers (2)

Related Questions