I Like
I Like

Reputation: 1847

How to find links within a specified class with Beautiful Soup

I'm using Beautiful Soup 4 to parse a news site for links contained in the body text. I was able to find all the paragraphs that contained the links but the paragraph.get('href') returned type none for each link. I'm using Python 3.5.1. Any help is really appreciated.

from bs4 import BeautifulSoup
import urllib.request
import re

soup = BeautifulSoup("http://www.cnn.com/2016/11/18/opinions/how-do-you-deal-with-donald-trump-dantonio/index.html", "html.parser")

for paragraph in soup.find_all("div", class_="zn-body__paragraph"):
    print(paragraph.get('href'))

Upvotes: 1

Views: 1824

Answers (1)

Ilya V. Schurov
Ilya V. Schurov

Reputation: 8087

Do you really want this?

for paragraph in soup.find_all("div", class_="zn-body__paragraph"):
    for a in paragraph("a"):
       print(a.get('href'))

Note that paragraph.get('href') tries to find attribute href in <div> tag you found. As there's no such attribute, it returns None. Most probably you actually have to find all tags <a> which a descendants of your <div> (this can be done with paragraph("a") which is a shortcut for paragraph.find_all("a") and then for every element <a> look at their href attribute.

Upvotes: 3

Related Questions