How to find links within a specified class with Beautiful Soup

Question

I'm using Beautiful Soup 4 to parse a news site for links contained in the body text. I was able to find all the paragraphs that contained the links but the paragraph.get('href') returned type none for each link. I'm using Python 3.5.1. Any help is really appreciated.

from bs4 import BeautifulSoup
import urllib.request
import re

soup = BeautifulSoup("http://www.cnn.com/2016/11/18/opinions/how-do-you-deal-with-donald-trump-dantonio/index.html", "html.parser")

for paragraph in soup.find_all("div", class_="zn-body__paragraph"):
    print(paragraph.get('href'))

Ilya V. Schurov · Accepted Answer

Do you really want this?

for paragraph in soup.find_all("div", class_="zn-body__paragraph"):
    for a in paragraph("a"):
       print(a.get('href'))

Note that paragraph.get('href') tries to find attribute href in

tag you found. As there's no such attribute, it returns None. Most probably you actually have to find all tags which a descendants of your

(this can be done with paragraph("a") which is a shortcut for paragraph.find_all("a") and then for every element look at their href attribute.

How to find links within a specified class with Beautiful Soup

Answers (1)

Related Questions