Beautifulsoup: how to get certain link from list?

Question

With BeautifulSoup how would one get the links from a webpage, store them in a list, then print out a certain one? This is what I have so far:

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("https://example.com/")
content = BeautifulSoup(html.read(), "html.parser")
for link in content.find_all("a"):
    print(link.get("href")[0])

But I get this error: TypeError: 'NoneType' object is not subscriptable How can I solve this problem and get the first link?

Polydynamical · Accepted Answer

To retrieve all links from a page, use regex.

The following code should do it for you:

from urllib.request import urlopen
from bs4 import BeautifulSoup
import re

html = urlopen("https://www.stmaryottumwa.org/")
content = BeautifulSoup(html.read(), "html.parser")
links = []

for link in content.find_all("a", attrs={'href': re.compile("^http")}):
    links.append(link.get("href"))

print(links[0]) # print first link on page

The variable links will contain all the links on the page.

Beautifulsoup: how to get certain link from list?

Answers (2)

Related Questions