Reputation: 1847
I'm using Beautiful Soup 4 to parse a news site for links contained in the body text. I was able to find all the paragraphs that contained the links but the paragraph.get('href')
returned type none
for each link. I'm using Python 3.5.1. Any help is really appreciated.
from bs4 import BeautifulSoup
import urllib.request
import re
soup = BeautifulSoup("http://www.cnn.com/2016/11/18/opinions/how-do-you-deal-with-donald-trump-dantonio/index.html", "html.parser")
for paragraph in soup.find_all("div", class_="zn-body__paragraph"):
print(paragraph.get('href'))
Upvotes: 1
Views: 1824
Reputation: 8087
Do you really want this?
for paragraph in soup.find_all("div", class_="zn-body__paragraph"):
for a in paragraph("a"):
print(a.get('href'))
Note that paragraph.get('href')
tries to find attribute href
in <div>
tag you found. As there's no such attribute, it returns None
. Most probably you actually have to find all tags <a>
which a descendants of your <div>
(this can be done with paragraph("a")
which is a shortcut for paragraph.find_all("a")
and then for every element <a>
look at their href
attribute.
Upvotes: 3