Alen
Alen

Reputation: 35

Beautifulsoup: how to get certain link from list?

With BeautifulSoup how would one get the links from a webpage, store them in a list, then print out a certain one? This is what I have so far:

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("https://example.com/")
content = BeautifulSoup(html.read(), "html.parser")
for link in content.find_all("a"):
    print(link.get("href")[0])

But I get this error: TypeError: 'NoneType' object is not subscriptable How can I solve this problem and get the first link?

Upvotes: 1

Views: 44

Answers (2)

Polydynamical
Polydynamical

Reputation: 244

To retrieve all links from a page, use regex.

The following code should do it for you:

from urllib.request import urlopen
from bs4 import BeautifulSoup
import re

html = urlopen("https://www.stmaryottumwa.org/")
content = BeautifulSoup(html.read(), "html.parser")
links = []

for link in content.find_all("a", attrs={'href': re.compile("^http")}):
    links.append(link.get("href"))

print(links[0]) # print first link on page

The variable links will contain all the links on the page.

Upvotes: 2

DeepSpace
DeepSpace

Reputation: 81594

In order to get the element's attributes you need to access the .attrs dict. Also keep in mind that sometimes a tags do not have an href attribute at all, you can get around that by using .get:

link.attrs.get('href')

I'm not sure what you expected [0] to do since an a tag can only have a single href attribute. Using [0] will get you the first character of the href attribute.

for link in content.find_all("a"):
    href = a.attrs.get('href')
    if href:
        print(href[0])

Upvotes: 2

Related Questions